Teaching AI to Read Liquid Biopsies

Large language models (LLMs) may help identify diagnostic biomarkers from cell-free RNA (cfRNA) data, although conventional analytical methods remain more reliable overall, according to a study published inNature Communications.

cfRNA, which can be detected in blood samples, is an emerging source of diagnostic information. However, identifying clinically useful biomarker signatures from these complex datasets remains challenging.

Researchers evaluated several LLMs from OpenAI, Anthropic, and Google using published cfRNA datasets from patients in three cohorts with differing diagnostic complexity: Kawasaki disease versus multisystem inflammatory syndrome in children (MIS-C), tuberculosis versus symptomatic respiratory controls; and myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) versus sedentary controls.

The models were asked to identify genes that could serve as diagnostic biomarkers based on existing scientific knowledge. These gene panels were then compared with randomly selected genes and panels generated using conventional differential expression analyses.

Overall, LLM-selected gene panels performed better than random selections, indicating that the models could identify biologically relevant candidates. Performance was strongest in the tuberculosis dataset, where some LLM-generated panels performed similarly to those identified using traditional methods. Results were more modest in Kawasaki disease and MIS-C and weakest in ME/CFS.

The models frequently selected genes involved in known immune and inflammatory pathways, suggesting they can draw on existing biomedical knowledge to support biomarker discovery.

Researchers also tested whether LLMs could independently perform an entire biomarker discovery workflow, from feature selection to diagnostic classification. Performance was inconsistent and generally did not surpass established machine learning approaches.

The study identified several limitations, including inconsistent adherence to instructions and challenges with reproducibility. As a result, the authors emphasize that LLM-generated biomarker signatures require rigorous validation before clinical application.

The findings suggest that LLMs could become useful tools for generating biomarker hypotheses and helping interpret large molecular datasets. However, current evidence supports their use alongside, rather than in place of, established bioinformatics and statistical methods.

Teaching AI to Read Liquid Biopsies

Researchers explore whether large language models can support cfRNA biomarker discovery

Explore More in Pathology

Recommended

A Light in the Darkness

Biomarkers: The Bigger Picture

Benchmarking… Liquid Biopsy

Does Your Blood Know You Have Cancer?

Explore

Featured Topics

Issues

Career Development

Educational Resources

Events

People & Profiles

Teaching AI to Read Liquid Biopsies

Researchers explore whether large language models can support cfRNA biomarker discovery

Newsletters

Explore More in Pathology

Recommended

Related Content

A Light in the Darkness

Biomarkers: The Bigger Picture

Benchmarking… Liquid Biopsy

Does Your Blood Know You Have Cancer?