BAMOS Vol 30 No. 4 2017 | Page 36

Research Corner with Damien Irving
BAMOS Dec 2017

Research Corner with Damien Irving

Best practices for scientific software

Code written by a research scientist typically lies somewhere on a continuum ranging from “ scientific code ” that was simply hacked together for individual use ( e . g . to produce a figure for a journal paper ) to “ scientific software ” that has been formally packaged and released for use by the wider community . I ’ ve written at length in previous issues about the best practices that apply to the scientific code end of the spectrum , so for this article I wanted to turn my attention to scientific software . In other words , what ’ s involved in turning scientific code into something that anyone can use ? My attempt at answering this question is based on my experiences as an Associate Editor with the Journal of Open Research Software . I ’ m focusing on Python since ( a ) most new scientific software in the weather / ocean / climate sciences is written in that language , and ( b ) it ’ s the language I ’ m most familiar with .
First off , you ’ ll need to create a repository on a site like GitHub or Bitbucket to host your ( version controlled ) software . As well as providing the means to make your code available to the community , these sites have features that help with things like community discussion and software release management . One of the first things you ’ ll need to include in your repository is a software license . Jake VanderPlas has an excellent post on why you need a license and how to pick one .
Packaging / Installation
If you want people to use your software , you need to make it as easy as possible for them to install it . In Python , this means packaging the code in such a way that it can be made available via the Python Package Index ( PyPI ). If your code and all the libraries it depends on are written purely in Python , then this is all you need to do . People will simply be able to “ pip install ” your software from the command line .
If your software has non-Python dependencies ( e . g . netCDF libraries ), then it ’ s a good idea to make sure that it can also be installed via conda . Using recipes that developers ( i . e . you , in this case ) submit to conda-forge , this popular package manager installs software and all its dependencies at once .
While it might seem like the documentation pages for your favourite Python libraries were painstakingly typed by hand , much of what appears on the web has been compiled by software that automatically takes the information from the docstrings in your code and formats it nicely . In most cases , people use Sphinx to generate the documentation and Read the Docs to publish it ( here ’ s a nice description of that whole process ).
Of course , while this saves a lot of time and effort in documenting the precise details of each function in your code library ( i . e . the API documentation ), you still need to write some narrative documentation that describes how to use your package ( with examples ). Expect to spend a significant fraction ( easily 25 %) of the time you spend writing code on writing documentation .
In providing assistance to users , software projects will typically use a combination of encouraging people to submit issues on their GitHub / Bitbucket page ( for technical questions that will possibly require a change to the code ) and platforms like Google Groups and / or Gitter ( a chat client provided by GitLab ) for more general questions about how to use the software . The bonus of these platforms is that anyone can view the questions and answers , not just the lead developers of the software . This means that random people from the community can chime in with answers ( reducing your workload ) and it also helps reduce the incidence of getting the same question from many people .
If you want users ( and your future self ) to trust that your code actually works , you ’ ll need to develop a suite of tests using one of the many testing libraries available in Python . You can then use a platform like Travis CI to automatically run those tests each