36
BAMOS Dec 2017
Research Corner with Damien Irving
Best practices for scientific software
Code written by a research scientist typically lies somewhere on a continuum ranging from “ scientific code ” that was simply hacked together for individual use ( e . g . to produce a figure for a journal paper ) to “ scientific software ” that has been formally packaged and released for use by the wider community . I ’ ve written at length in previous issues about the best practices that apply to the scientific code end of the spectrum , so for this article I wanted to turn my attention to scientific software . In other words , what ’ s involved in turning scientific code into something that anyone can use ? My attempt at answering this question is based on my experiences as an Associate Editor with the Journal of Open Research Software . I ’ m focusing on Python since ( a ) most new scientific software in the weather / ocean / climate sciences is written in that language , and ( b ) it ’ s the language I ’ m most familiar with .
Hosting
First off , you ’ ll need to create a repository on a site like
GitHub or
Bitbucket to host your ( version controlled ) software . As well as providing the means to make your code available to the community , these sites have features that help with things like community discussion and software release management . One of the first things you ’ ll need to include in your repository is a software license . Jake VanderPlas has
an excellent post on why you need a license and how to pick one .
Packaging / Installation
If you want people to use your software , you need to make it as easy as possible for them to install it . In Python , this means
packaging the code in such a way that it can be made available via the Python Package Index ( PyPI ). If your code and all the libraries it depends on are written purely in Python , then this is all you need to do . People will simply be able to “ pip install ” your software from the command line .
If your software has non-Python dependencies ( e . g . netCDF libraries ), then it ’ s a good idea to make sure that it can also be installed via
conda conda-forge , this popular package manager installs software and all its dependencies at once .
Documentation
While it might seem like the documentation pages for your favourite Python libraries were painstakingly typed by hand , much of what appears on the web has been compiled by software that automatically takes the information from the docstrings in your code and formats it nicely . In most cases , people use
Sphinx to generate the documentation and
Read the Docs to publish it (
here ’ s a nice description of that whole process ).
Of course , while this saves a lot of time and effort in documenting the precise details of each function in your code library ( i . e . the API documentation ), you still need to write some narrative documentation that describes how to use your package ( with examples ). Expect to spend a significant fraction ( easily 25 %) of the time you spend writing code on writing documentation .
Assistance
In providing assistance to users , software projects will typically use a combination of encouraging people to submit issues on their GitHub / Bitbucket page ( for technical questions that will possibly require a change to the code ) and platforms like Google Groups and / or Gitter ( a chat client provided by GitLab ) for more general questions about how to use the software . The bonus of these platforms is that anyone can view the questions and answers , not just the lead developers of the software . This means that random people from the community can chime in with answers ( reducing your workload ) and it also helps reduce the incidence of getting the same question from many people .
Testing
If you want users ( and your future self ) to trust that your code actually works , you ’ ll need to develop a suite of tests using one of the
many testing libraries available in Python . You can then use a platform like
Travis CI to automatically run those tests each
36
BAMOS
Dec 2017
Research Corner
with Damien Irving
Best practices for scientific software
Code written by a research scientist typically lies somewhere
on a continuum ranging from “scientific code” that was simply
hacked together for individual use (e.g. to produce a figure for
a journal paper) to “scientific software” that has been formally
packaged and released for use by the wider community. I’ve
written at length in previous issues about the best practices
that apply to the scientific code end of the spectrum, so for
this article I wanted to turn my attention to scientific software.
In other words, what’s involved in turning scientific code into
something that anyone can use? My attempt at answering this
question is based on my experiences as an Associate Editor with
the Journal of Open Research Software. I’m focusing on Python
since (a) most new scientific software in the weather/ocean/
climate sciences is written in that language, and (b) it’s the
language I’m most familiar with.
Hosting
First off, you’ll need to create a repository on a site like GitHub
or Bitbucket to host your (version controlled) software. As well
as providing the means to make your code available to the
community, these sites have features that help with things like
community discussion and software release management. One
of the first things you’ll need to include in your repository is a
software license. Jake VanderPlas has an excellent post on why
you need a license and how to pick one.
Packaging/Installation
If you want people to use your software, you need to make it
as easy as possible for them to install it. In Python, this means
packaging the code in such a way that it can be made available
via the Python Package Index (PyPI). If your code and all the
libraries it depends on are written purely in Python, then this is
all you need to do. People will simply be able to “pip install” your
software from the command line.
If your software has non-Python dependencies (e.g. netCDF
libraries), then it’s a good idea to make sure that it can also be
installed via conda. Using recipes that developers (i.e. you, in
this case) submit to conda-forge, this popular package manager
installs software and all its dependencies at once.
Documentation
While it might seem like the documentation pages for your
favourite Python libraries were painstakingly typed by hand,
much of what appears on the web has been compiled by
software that automatically takes the information from the
docstrings in your code and formats it nicely. In most cases,
people use Sphinx to generate the documentation and Read
the Docs to publish it (here’s a nice description of that whole
process).
Of course, while this saves a lot of time and effort in documenting
the precise details of each function in your code library (i.e. the
API documentation), you still need to write some narrative
documentation that describes how to use your package (with
examples). Expect to spend a significant fraction (easily 25%)
of the time you spend writing code on writing documentation.
Assistance
In providing assistance to users, software projects will typically
use a combination of encouraging people to submit issues
on their GitHub/Bitbucket page (for technical questions that
will possibly require a change to the code) and platforms like
Google Groups and/or Gitter (a chat client provided by GitLab)
for more general questions about how to use the software. The
bonus of these platforms is that anyone can view the questions
and answers, not just the lead developers of the software. This
means that random people from the community can chime in
with answers (reducing your workload) and it also helps reduce
the incidence of getting the same question from many people.
Testing
If you want users (and your future self ) to t \][\BXX[Hܚ[x&[YY][HZ]Hو\\[ۙBوHX[H\[X\Y\]Z[XH[]ۋ[H[[\HH]ܛHZH]\H]]X]X[H[H\XX