BAMOS Vol 30 No.1 2017 | Page 24

Research Corner with Damien Irving
BAMOS March 2017
24

Research Corner with Damien Irving

The weather / climate Python stack

It would be an understatement to say that Python has exploded onto the data science scene in recent years. PyCon and SciPy conferences are held somewhere in the world every few months now, at which loads of new and / or improved data science libraries are showcased to the community. When the videos from these conferences are made available online( which is almost immediately at pyvideo. org), I’ m always filled with a mixture of joy and dread. The ongoing rapid development of new libraries means that data scientists are( hopefully) continually able to do more and more cool things with less and less time and effort, but at the same time it can be difficult to figure out how they all relate to one another. To assist in making sense of this constantly changing landscape, this article summarises the current state of the weather and climate Python software“ stack”( i. e. the collection of libraries used for data analysis and visualisation), with particular focus on libraries that are widely used and that have good( and likely long-term) support.
Core
The dashed box in Figure 1 represents the core of the stack, so let’ s start our tour there. The default library for dealing with numerical arrays in Python is numpy. It has a bunch of built in functions for reading and writing common data formats like. csv, but if your data is stored in netCDF format then the default library for getting data into / out of those files is netCDF4.
Once you’ ve read your data in, you’ re probably going to want to do some statistical analysis. The numpy library has some built in functions for calculating very simple statistics( e. g. maximum, mean, standard deviation), but for more complex analysis( e. g. interpolation, integration, linear algebra) the scipy library is the default.
The numpy library doesn’ t come with any plotting capability, so if you want to visualise your numpy data arrays then the default library is matplotlib. This library is great for any simple( e. g. bar charts, contour plots, line graphs), static( e. g.. png,. eps,. pdf) plots. The cartopy library provides additional functionality for common map projections, while bokeh allows for the creation of interactive plots where you can zoom and scroll.
Figure 1. The weather / climate Python stack.