GLOBAL HEALTH
Using Machine Learning to Evaluate Text Responses from a Laboratory Data Repository
By Fredrick Mwasekaga , informatics specialist , Tanzania ; and Reshma Kakkar , informatics manager , Global Health
Machine learning ( ML ), a subset of artificial intelligence ( AI ), involves training algorithms to recognize patterns and make decisions based on data . Supervised and unsupervised learning are types of ML .
APHL has helped several countries set up a central laboratory data repository that captures all data produced by laboratory systems and / or modules . A ML model is a program or algorithm that can be fine-tuned by performing a supervised training to extract the type of data required to do the analysis . For example , people ’ s names may not be captured in the central laboratory data repository , therefore there is no need to train the algorithm on that process .
The majority of the supervised training will involve :
1 . Text Classification : Categorizing data into predefined categories , such as whether a laboratory result is normal or abnormal .
2 . Sentiment Analysis : Analyzing data to determine the sentiment expressed , such as “ an organism found ” versus “ an organism not found .”
3 . Named Entity Recognition ( NER ): Identify and classify entities like chemical compounds , gene names or disease conditions within text data . International terminology standards are a great source for NER .
Challenges
While the advantages are compelling , there are several challenges to keep in mind when using machine learning for text evaluation :
1 . Data Quality : The effectiveness of ML models depends heavily on the quality of the data used for training . Inconsistent , incomplete or biased data can lead to inaccurate results ; these are called hallucinations . Using natural language processing ( NLP ), which is another component of AI , data can be extracted from free-written text results , especially ones with misspelled words ( i . e ., Malaria vs Mariah ).
2 . Model Training : Smaller ML models can process data faster but are not as accurate . ML models have to address issues like data privacy , consent ( i . e ., licenses ), and potential biases in the algorithms .
3 . Interpretability : Understanding how complex ML models arrive at a specific decision is crucial for trust and transparency ( i . e ., how a NER is classified as a person rather than an organism ).
4 . Computing Power : To be able to process large models , hardware utilized for processing must be able to load them ( i . e ., lots of random access memory ( RAM ) on the memory stick and graphical processing unit ( GPU ).
Future Prospects
APHL is working on incorporating these ML models with the next version of the laboratory data repository tool to enhance the analysis and automating process . g
Sustaining Member Relationship continued from page 27
updated version of the GI panel software . The software update changes the way that the background subtraction is done for the crypt2 assay so that the non-specific melt product is no longer able to influence the background subtraction .
FDA rapidly cleared the software update , and bioMérieux followed up with Field Corrective Action ( FCA ) 5747 on April 25 , 2023 , deploying the software to BioFire FilmArray Torch Systems and BioFire FilmArray 2.0 system users . Within the technical note distributed by bioMérieux , customers were able to find instructions on installation procedures and technical support . Subsequently , CryptoNet and member sites were able to observe and confirm the update was an appropriate solution to the issue . As of June 2023 , with the updated software in place , state laboratories were seeing concordance rates of 98 % when re-testing samples received from clinical laboratories .
Conclusion
Enhanced collaboration between CDC , APHL , public health laboratories and sustaining members facilitates streamlined data and information sharing , and enables a rapid , coordinated response to public health challenges . By fostering open dialogue and cooperation , public health partners can address emerging issues more efficiently , adapt to technological changes , and improve patient care outcomes . As cultureindependent diagnostic tests continue to evolve , maintaining strong inter-agency and cross-sector communication will be essential for maximizing its potential benefits and navigating future challenges in disease diagnostics and surveillance .
For more information on this success or other CryptoNet related projects , please contact Rhodel Bradshaw , senior specialist , Food Safety at rhodel . bradshaw @ aphl . org . g
Reference
1 . Centers for Disease Control and Prevention . CryptoNet : Tracking Cryptosporidium in the U . S . May 2024 . CryptoNet : Tracking Cryptosporidium in the U . S . | Cryptosporidium (“ Crypto ”) | CDC
2 . Centers for Disease Control and Prevention . Foodborne Illness and Culture-Independent Diagnostic Tests . January 2023 . Foodborne Illness and Culture-Independent Diagnostic Tests | FoodNet | CDC
28 LAB MATTERS Fall 2024
PublicHealthLabs @ APHL APHL . org