The New Era of Public Health Bioinformatics by Joel R . Sevinsky , PhD , founder and CEO , Theiagen Genomics
Next-generation sequencing ( NGS ) and bioinformatics are transforming public health , and in no discipline is this more apparent than in infectious disease surveillance . It is now common practice for public health scientists to rely on variant distributions of SARS-CoV-2
along with case counts to help inform public health decision making during this pandemic . Additionally , if used properly , this kind of data can dramatically increase the efficiency of disease control work .
But two years ago , such uses of molecular epidemiology were limited to a more select group of pathogens . While strategic investments from the Office of Advanced Molecular Detection
and the PulseNet
program at the US Centers for Disease Control and Prevention
( CDC ) has dramatically improved access to and development of NGS technologies and bioinformatics for infectious disease surveillance , the current pandemic highlighted how much work still needs to be done . Many laboratories were ready to sequence this new pandemic pathogen , but were not ready to analyze the data to drive public health actions . Fortunately , cloud computing technology is rapidly accelerating , bringing new tools , paradigms and infrastructures that are dramatically transforming public health scientific computing .
A Revised Perspective
Key technologies contributing to this transformation are :
1 . Encapsulation of standardized algorithms and their dependencies in containers ( i . e ., Docker , Singularity ).
2 . High level , reproducible data flows between containers using workflow description languages ( i . e ., WDL , Nextflow , CWL )
3 . Workflow orchestration platforms to effectively manage the resources in a scalable fashion ( i . e ., Terra . bio
, Nextflow Tower
These technologies collectively provide an open source framework with community support . For example , bioinformatics workflow and container contributions from the State Public Health Bioinformatics
group ( StaPH-B ) and Theiagen Genomics
allow for the democratization of high-performance computing , so every laboratory can have access to a high-performance scientific computing environment for their public health needs .
The technical highlights of these systems include their scalability , portability and standardization . This allows for anyone , anywhere with an internet connection , to perform a reproducible analysis on their pathogen genomic sequence . There are non-technical benefits as well . In government systems where highperformance computing is traditionally associated with capital investments , information technology support contracts , and large upfront costs , these new infrastructure models transform scientific computing resources into an inexpensive consumable that can be budgeted based on specimen volume . Most analyses cost less than a dollar , a tiny fraction of the costs required to generate the raw data in the laboratory ( i . e ., sequencing instruments , service contracts , consumables and labor ).
Change for the Better
and Theiagen Genomics , along with financial support from federal and state contracts . The simplicity of Terra . bio allowed many laboratories without any bioinformatics experience to be trained in less than 90 minutes on open source bioinformatics workflows in order to generate meaningful data from SARS-CoV-2 genomic sequences . Now that public health laboratories are realizing the potential of cloud resources for infectious disease surveillance , work is already underway to bring other pathogen surveillance systems into cloud-based infrastructures , including Enterics
, Mycobacterium tuberculosis
, HIV , healthcare associated infections ( HAIs ), Candida auris
and others .
The pandemic has taught us it is difficult to predict where things will go next . However , in the area of scientific computing for public health , the evidence is clear that containerization , workflow managers and workflow orchestration environments are here to stay . Public health leaders / scientists are excited that the traditional requirements for genomic surveillance ( e . g ., supercomputers and bioinformatics professionals on site ) are no longer barriers to entry . Furthermore , as these technologies continue to mature and harmonization efforts between the two competing orchestration platforms — Terra . bio and Nextflow Tower — progress , it will usher in an era of standardized workflows that can be run anywhere , on any system , at any time , by anyone . g
Theiagen Genomics is an APHL Platinum Level Sustaining Member .