Lab Matters Spring 2021 | Page 23

What is a Data Lake ?

Harnessing the Power of Data to Track Antibiotic Resistance Threats

by Rachel Shepherd , specialist , Informatics
Since the launch of the Antibiotic Resistance Laboratory Network ( AR Lab Network ) in 2016 , APHL has helped to develop the technical solutions and secure infrastructure that connects medical providers , public health laboratories and the US Centers for Disease Control and Prevention ( CDC ) epidemiologists to enable a coordinated and timely exchange of testing data and ensure immediate intervention .
Central to the success of the network was the AR Lab Network Reporting Portal , which was rapidly deployed to help AR testing labs report testing results to CDC in a standardized format and was used as a central repository for all AR results . Over time , APHL , CDC and public health laboratories realized their needs were evolving and needed a more responsive and flexible system . While they wanted to retain the same electronic messaging capabilities as allowed by the AR Lab Network Reporting Portal , they also needed :
• CDC subject matter experts to be able to audit , review and analyze AR data submitted to the Portal .
• The ability for public health laboratories to view and verify the data submitted on behalf of their organizations , including errors and warnings .
• Real-time key metrics to be available for each CDC program through the use of a customizable dashboard .
In short , they needed better and more timely access for the people who needed to act upon the data to track AR threats , guide public health action and respond to public health emergencies . In 2019 , APHL conducted a technical analysis of available options and identified the use of a data lake as an enterprise-wide technical solution for the new Data for Action on Antibiotic Resistance Threats portal ( DAART ).

What is a Data Lake ?

A data lake is a collection of vast amounts of data stored en masse in their natural format . As opposed to the traditional data warehouse that stores data categorically according to purpose , a data lake serves as a single repository for enterprise-wide data , meaning that all structured and unstructured data from a variety of sources is contained in a single pool ( or lake ). This means that data is accessible from a variety of sources and can be repurposed to meet multiple data needs . For example , influenza data could be cross-referenced against antibiotic resistance data , and previously unknown correlations could be discovered .
APHL ’ s goal was to develop a data lake on the APHL Informatics Messaging Services ( AIMS ) platform that could be used across all use cases , but was launched with the intention of using AR results reporting as an initial pilot project .
Ready to Launch , But …
The development and launch of DAART relied on extreme collaboration across multiple organizations leadership and subject matter experts — APHL , CDC program and technical experts , public health laboratories , AIMS architects , and developers for DAART . Well-defined roles and processes and daily project management were critical . After the data lake solution was identified , the team spent the latter part of 2019 refining requirements and testing with laboratories submitting data . In January 2020 , DAART was nearly ready to go live .
And then COVID-19 struck .
All data lake developers , technical experts and APHL resources were immediately deployed to work on COVID-19 Electronic Laboratory Reporting ( CELR ) and everything that had been developed for DAART was repurposed for pandemic reporting . Although this meant the inevitable delay of DAART , the ability to repurpose and customize existing infrastructure saved months of development time that would have otherwise been needed to launch a technical solution for the electronic transmission of COVID-19 data nationwide .
DAART Goes Live
DAART officially launched in January 2021 and laboratories currently upload their AR results to the portal either via HL7 messaging or through a CSV upload . Those results are then automatically compiled and exported from the data lake , where CDC program subject matter experts can log in and generate a report .
DAART makes AR reporting more streamlined — through this portal it is easier for submitters to ensure the accuracy of their reporting , and it provides more flexibility for CDC programs to access the data to provide necessary guidance and surveillance . n
DIGITAL EXTRA : Email informatics . support @ aphl . org for ongoing project updates , expected go-live dates , information on DAART training webinars , FAQs and questions about technical assistance .
Spring 2021 LAB MATTERS 21