Lab Matters Winter 2023 | Page 8

FROM THE BENCH

PulseNet 2.0 : The Future of Genomic Surveillance for Foodborne Outbreaks

By Courtney Wheeler , ASRT contractor , Division of Foodborne , Waterborne and Environmental Diseases , Enteric Diseases Laboratory Branch , US Centers for Disease Control and Prevention
In 2019 , the US Centers for Disease Control and Prevention ’ s ( CDC ’ s ) PulseNet — the nationwide laboratory network for detection of foodborne outbreaks — transitioned to whole genome sequencing ( WGS ) as the gold standard for foodborne outbreak detection and surveillance . Since then , the bioinformatics demands for processing data have expanded . To better support those needs of partners , CDC is launching PulseNet 2.0 .
Where PulseNet is Going
PulseNet is committed to serving the needs of member laboratories who are at the forefront of foodborne outbreak detection and surveillance of illnesses caused by enteric bacteria . To better support the bioinformatics needs of state partners , PulseNet is developing a cloud-based analytic platform to enhance data analysis , management and visualization capabilities of WGS data for outbreak detection and surveillance . This platform will improve efficiency and reduce time needed to upload and analyze sequence data while preserving the core functions of PulseNet-enteric pathogen identification and cluster detection to identify and monitor outbreaks . The new PulseNet platform will use open-source bioinformatics tools for data management and analytics and increase options for data visualization . Integration of open-source tools in
PulseNet 2.0
SAMS Authentication
1
ACCESS
Users will access the cloud-based PulseNet 2.0 platform .
IMPORT
PulseNet Web Application
2 Sequence and metadata will
be submitted through the web .
MICROSOFT AZURE CLOUD
PROCESS
PulseNet Pipeline
3 Reads are assembled for downstream
analysis and assessed for quality . NextFlow tower processes data through bioinformatics pipelines for ANI , contamination , allele calling , and genotyping .
a cloud-based platform supports CDC ’ s Data Modernization Initiative by improving efficiency in analysis workflows , decreasing barriers to data processing and centralizing data storage . Together , these improvements will streamline how we store , process , access and share data to detect outbreaks earlier and faster .
How We ’ re Going to Get There
The initial phase of the PulseNet 2.0 minimum viable product ( MVP ) was internally tested by CDC PulseNet staff from June through September 2023 . PulseNet 2.0 pilot laboratories received demonstrations of the application and additional feedback was gathered to optimize the system according to partner needs . This initial round of the MVP platform development is expected to be certified by CDC PulseNet and migrated to a production environment by December 2023 where PulseNet 2.0 pilot laboratories will perform external validation of the system . The transition to the fully operational capacity PulseNet 2.0 system is anticipated to start in the third quarter of 2024 .
What You Will Get
PulseNet 2.0 will offer greater flexibility and functionality through a centralized data storage space . All data published to the PulseNet national databases will
4
ANALYZE
Processed Data
Processed , quality data can be published to the PulseNet National Databases and viewed in data visualization tools .
sedric
5
UPLOAD
External Storage
Upload raw sequence data to NCBI for data storage . be stored in the cloud hosted by CDC through Microsoft Azure . A centralized data storage structure will simplify analysis and decrease processing time by eliminating the need to “ push ” and “ pull ” data between local environments and CDC databases . CDC prioritizes data security and is committed to securing sensitive information in their cloudbased services . If users cannot access cloud services , then they cannot access PulseNet 2.0 . However , all bioinformatics tools will be containerized and accessible to use in other cloud or computing environments outside PulseNet 2.0 .
PulseNet 2.0 will host the tools and bioinformatics software used to analyze sequence data and identify clusters . These tools will be stored in containerized repositories that are accessed by workflow management programs like NextFlow Tower . Together with the reference databases , these tools comprise the calculation engine . Processed data can then be visualized using open-source software like MicrobeTrace , GrapeTree and NextStrain and shared with other systems like CDC ’ s SEDRIC , all within the cloud .
Open-source tools enable customization of PulseNet pipelines and the potential for members to create custom analysis pipelines in the future . Containerization allows for quick alteration of individual components and updates or integration of novel tools within a pipeline without affecting the rest of the workflow . Additionally , customized pipelines provide the opportunity for automation of analysis . Pre-built pipelines will allocate data to each of the proper containers in a workflow , reducing demand on users to manually manage data through each analysis job .
Member laboratories will no longer have locally stored analyzed sequence data ; however , each laboratory will have their own “ Lab View ” window to manage data before publishing to the PulseNet national databases . Each laboratory view will be created from data currently in PulseNet
345149-A
6 LAB MATTERS Winter 2023
PublicHealthLabs @ APHL APHL . org