BAMOS Vol 30 No. 4 2017 | Page 36

Research Corner with Damien Irving
36
BAMOS Dec 2017

Research Corner with Damien Irving

Best practices for scientific software

Code written by a research scientist typically lies somewhere on a continuum ranging from“ scientific code” that was simply hacked together for individual use( e. g. to produce a figure for a journal paper) to“ scientific software” that has been formally packaged and released for use by the wider community. I’ ve written at length in previous issues about the best practices that apply to the scientific code end of the spectrum, so for this article I wanted to turn my attention to scientific software. In other words, what’ s involved in turning scientific code into something that anyone can use? My attempt at answering this question is based on my experiences as an Associate Editor with the Journal of Open Research Software. I’ m focusing on Python since( a) most new scientific software in the weather / ocean / climate sciences is written in that language, and( b) it’ s the language I’ m most familiar with.
Hosting
First off, you’ ll need to create a repository on a site like GitHub or Bitbucket to host your( version controlled) software. As well as providing the means to make your code available to the community, these sites have features that help with things like community discussion and software release management. One of the first things you’ ll need to include in your repository is a software license. Jake VanderPlas has an excellent post on why you need a license and how to pick one.
Packaging / Installation
If you want people to use your software, you need to make it as easy as possible for them to install it. In Python, this means packaging the code in such a way that it can be made available via the Python Package Index( PyPI). If your code and all the libraries it depends on are written purely in Python, then this is all you need to do. People will simply be able to“ pip install” your software from the command line.
If your software has non-Python dependencies( e. g. netCDF libraries), then it’ s a good idea to make sure that it can also be installed via conda. Using recipes that developers( i. e. you, in this case) submit to conda-forge, this popular package manager installs software and all its dependencies at once.
Documentation
While it might seem like the documentation pages for your favourite Python libraries were painstakingly typed by hand, much of what appears on the web has been compiled by software that automatically takes the information from the docstrings in your code and formats it nicely. In most cases, people use Sphinx to generate the documentation and Read the Docs to publish it( here’ s a nice description of that whole process).
Of course, while this saves a lot of time and effort in documenting the precise details of each function in your code library( i. e. the API documentation), you still need to write some narrative documentation that describes how to use your package( with examples). Expect to spend a significant fraction( easily 25 %) of the time you spend writing code on writing documentation.
Assistance
In providing assistance to users, software projects will typically use a combination of encouraging people to submit issues on their GitHub / Bitbucket page( for technical questions that will possibly require a change to the code) and platforms like Google Groups and / or Gitter( a chat client provided by GitLab) for more general questions about how to use the software. The bonus of these platforms is that anyone can view the questions and answers, not just the lead developers of the software. This means that random people from the community can chime in with answers( reducing your workload) and it also helps reduce the incidence of getting the same question from many people.
Testing
If you want users( and your future self) to trust that your code actually works, you’ ll need to develop a suite of tests using one of the many testing libraries available in Python. You can then use a platform like Travis CI to automatically run those tests each