Research Corner with Damien Irving
BAMOS March 2017
24
Research Corner with Damien Irving
The weather / climate Python stack
It would be an understatement to say that Python has exploded onto the data science scene in recent years . PyCon and SciPy conferences are held somewhere in the world every few months now , at which loads of new and / or improved data science libraries are showcased to the community . When the videos from these conferences are made available online ( which is almost immediately at pyvideo . org ), I ’ m always filled with a mixture of joy and dread . The ongoing rapid development of new libraries means that data scientists are ( hopefully ) continually able to do more and more cool things with less and less time and effort , but at the same time it can be difficult to figure out how they all relate to one another . To assist in making sense of this constantly changing landscape , this article summarises the current state of the weather and climate Python software “ stack ” ( i . e . the collection of libraries used for data analysis and visualisation ), with particular focus on libraries that are widely used and that have good ( and likely long-term ) support .
Core
The dashed box in Figure 1 represents the core of the stack , so let ’ s start our tour there . The default library for dealing with numerical arrays in Python is numpy . It has a bunch of built in functions for reading and writing common data formats like . csv , but if your data is stored in netCDF format then the default library for getting data into / out of those files is netCDF4 .
Once you ’ ve read your data in , you ’ re probably going to want to do some statistical analysis . The numpy library has some built in functions for calculating very simple statistics ( e . g . maximum , mean , standard deviation ), but for more complex analysis ( e . g . interpolation , integration , linear algebra ) the scipy library is the default .
The numpy library doesn ’ t come with any plotting capability , so if you want to visualise your numpy data arrays then the default library is matplotlib . This library is great for any simple ( e . g . bar charts , contour plots , line graphs ), static ( e . g . . png , . eps , . pdf ) plots . The cartopy library provides additional functionality for common map projections , while bokeh allows for the creation of interactive plots where you can zoom and scroll .
Figure 1 . The weather / climate Python stack .