Lab Matters Winter 2018 | Page 11

from the bench CDC Launches Bioinformatics App to Determine Sequence Type from Legionella pneumophila NGS Data By Shatavia S. Morrison, PhD, US Centers for Disease Control and Prevention, Respiratory Diseases Branch; Brian H. Raphael, PhD, US Centers for Disease Control and Prevention, Respiratory Diseases Branch; and Jonas M. Winchell, PhD, US Centers for Disease Control and Prevention, Respiratory Diseases Branch One feature that was developed specifically for this app was to address the challenge of analyzing data associated with a paralog with one of the seven loci used in SBT. A paralog sequence is defined as a duplicate sequence located in a different region of the genome. Traditional methodologies such as PCR were designed to handle this issue, but this is difficult to extract from shot-gun sequencing data. The feature tries to mitigate this issue by anchoring reads to the paralog location in the genome and retrieving the actual loci allele information. At the 2017 Advanced Molecular Detection Day hosted at the US Centers for Disease Control and Prevention (CDC) in Atlanta, GA, the Pneumonia Response and Surveillance Laboratory presented a software app that allows users to submit their Legionella pneumophila whole genome sequencing data to the Office of Advanced Molecular Detection (OAMD) Bioinformatics portal to extract in silico Sequencing Based Typing (SBT) information. The app is the first of its kind hosted on the OAMD Bioinformatics portal which leverages OAMD scientific computing resources and a user friendly graphical interface to support public health laboratories (PHLs) in their research and outbreak investigations of L. pneumophila. No Computer Programming Required Addressing L. pneumophila Genomics in a Snapshot SBT for L. pneumophila is a technique used during outbreak investigations to cluster environmental and clinical isolates. A curated international database of sequence types (STs) is available allowing investigators to identify where other strains with similar STs may have been isolated. 1,2 With the increased use of whole genome sequencing (WGS) during L. pneumophila outbreak investigations, the extraction of SBT information is useful in providing a preliminary clustering analysis to determine if isolates may or may not be associated with an outbreak. There are other methods such as whole genome multi-locus sequence typing (wgMLST) that can provide a higher level of resolution, but they are typically time and computationally expensive. The SBT analysis provides an initial assessment of genome relatedness and requires a small fraction of the genome. PublicHealthLabs @APHL One app requirement was to minimize the number of steps the user had to perform in order to generate data to assist with their L. pneumophila research study or outbreak investiga