Lab Matters Fall 2023 | Page 73

APHL 2023 POSTER ABSTRACTS
Implementing Influenza Sequencing for Molecular Epidemiological Surveillance of Influenza Strains Throughout the 2022-2023 Season in New Jersey
H . Schrader , N . Palmateer , B . Jeong , B . Schwem , D . Woell , R . Siderits , T . Kirn ; New Jersey Department of Health Public Health and Environmental Laboratories
Influenza viruses are single-stranded , negative-sense RNA viruses that pose a serious risk to human health . Infections from the influenza virus can range in severity , varying from mild illness to death . The influenza genome is comprised of eight RNA gene segments that are prone to changes , through the processes of antigenic shift and drift , which may have an impact vaccine effectiveness . The two types of influenza viruses that cause seasonal epidemics are influenza A and B , where A is further divided into subtypes and B into lineages . Influenza A virus subtypes are based on the combination of hemagglutinin ( H ) protein ( 18 types ) and neuraminidase ( N ) protein ( 11 types ) in the virus . Whole genome sequencing of influenza can better inform scientists about the evolution of the influenza virus and in turn , provide epidemiological data for vaccine development and maintenance . One of the important public health needs addressed at New Jersey Public Health and Environmental Laboratories ( NJPHEL ) is the surveillance of seasonal circulating influenza viruses utilizing whole genome sequencing . In order to meet this need , a protocol for whole genome sequencing of influenza viruses was implemented . RNA extracted from clinical samples , deemed positive for influenza by qPCR testing , was used for sequencing . Complementary DNA ( cDNA ) was generated from the RNA and amplified by PCR . Primers used for amplification targeted a conserved region of the gene segments in the influenza A virus and each individual segment of the influenza B virus , as suggested by the CDC . Resultant amplicons were purified and used to prepare libraries with the Illumina Nextera DNA library preparation kit . Short-read next generation sequencing was conducted on the Illumina iSeq . Sequencing data were used to generate assemblies for each of the eight gene segments to determine the lineage and subtype of each genome , and to identify Single Nucleotide Polymorphisms ( SNPs ) against the appropriate subtype reference . Several steps of the protocol were optimized , including increasing the starting RNA volume , modifying the cDNA synthesis approach , and adjusting the loading concentration for sequencing . Sequencing results provided a comprehensive look at the clade-specific genetic changes of H3N2 , the most prominent subtype of influenza A recorded during the 2022-23 season . Utilizing the assemblies generated , a phylogenetic tree was created to visualize genetic differences in each of the eight gene segments across specimens collected in New Jersey during the current influenza season . From this work , NJPHEL now has the capability for whole genome sequencing of influenza that will be continued as a surveillance method , and sequencing data will be reported to the CDC via the National Center for Biotechnology Information . Implementing this method will help to bolster vaccine development and support national influenza surveillance efforts .
Presenter : Hannah Schrader , hannah . schrader @ doh . nj . gov
Improved Quality Control Analysis of Illumina Sequencing Data
J . Bologna , M . Su , J . Wang , M . Chowdhury , N . De La Cruz , T . Chowdhury , C . Thi , S . Silver , X . Chen , T . Clabby , F . Taki , E . Omoregie , S . Hughes ; New York City Department of Health and Mental Hygiene Public Health Laboratory
Sequencing data generated from high throughput sequencing ( HTC ) methods requires rigorous quality control steps to ensure the accuracy of downstream analysis . The importance of quality control to evaluate sequence data is well established in bioinformatics , and while general guidelines are in place , they are actively changing with the introduction of new tools and data requirements . Due to the vast quantity of data produced by SARS-CoV-2 sequencing efforts , the time dedicated to ensuring sequence quality has increased . Additionally , the need to classify sequence lineages requires added quality checks to keep mixed infection and potential contamination out of downstream analyses . To tackle these challenges , we have introduced new processes to our workflow by adding MultiQC and expanded our quality control dashboard to include a quality visualization of the plate layout . Our current process uses FastQC ( Babraham Bioinformatics ) as a first step in quality assessment . FastQC is a widely used tool for assessing the quality of short-read data . FastQC returns a full report for each sequence , making the evaluation process time consuming . Therefore , we have added MultiQC to our quality assessment . MultiQC takes the output from a variety of available tools , including FastQC , and provides a single report summarizing the data for easier interpretation . We customized the MultiQC report to show sequences failing at specified thresholds , such as those with mean quality scores 5 , and adapter content > 5 %. MultiQC also generates data that can be used in downstream analyses to create additional metrics . With this data , we created a table that calculates a score for each read between 0 and 3 , based on where the mean quality along a read drops below a certain threshold . Reads with a score below 1 are flagged , and a score of 0 fails . This previously manual process has now been automated using a python script . To flag potential mixed infections and contamination across the 96-well plate used for sequencing , we have expanded the use of our quality control dashboard . The dashboard is used to visualize sequence depth across the genome , check for amplicon dropouts , coverage levels , and overall depth of sequence samples . We added to the dashboard a visual of the sequencing plate layout , with each well labeled and color scaled with the mean allele frequencies for each sample . If these frequencies vary across the genome , that sample is flagged for follow-up . This visualization is especially useful for detecting cross-contamination as neighboring wells are the most likely candidates for liquid spillovers . This additional QC step helps to ensure the integrity of our pangolin ( --analysis-mode usher ) lineage calls for SARS-CoV-2 . These changes to our workflow have resulted in a simplified quality assessment process with clearer visualizations and fewer reports and has reduced the time spent on quality assessment by half .
Presenter : Jessie Bologna , jbologna @ health . nyc . gov
PublicHealthLabs
@ APHL
APHL . org
Fall 2023 LAB MATTERS 71