AI Tackles Pathology Report Complexity

How does AI extract structured data from pathology reports, and can it improve consistency and accuracy in real-world laboratory workflows? We connected with Marilyn Bui and Ghulam Rasool following the publication of their study to discuss the future implications for patient care.

Why is it so difficult to turn surgical pathology reports into clean, structured data that labs and health systems can actually use?

Marilyn Bui: Pathology reports contain critical clinical information, but their presentation often reflects a mix of art and science. They are largely free-text and descriptive, with meanings that can be complex and context dependent. Report structure also varies widely between institutions and practices, despite ongoing efforts by pathology organizations to promote standardization.

In addition, evolving clinical and biomarker requirements – for example, new immunohistochemistry scoring or molecular thresholds – are introduced regularly, yet report formats may not be updated consistently. As a result, although pathology reports contain rich clinical data, locating key details is not always straightforward.

Ghulam Rasool: Pathology reports were not designed to function as databases. They are narrative clinical documents shaped by individual pathologists’ writing styles, subspecialty practices, and institutional conventions. As a result, key variables – such as tumor site, laterality, or stage – may be stated clearly in one report, implied in another, or distributed across multiple sections.

In our analysis, both computational methods and expert review showed that even experienced human abstractors often need to interpret or reconcile information when extracting data from these reports. This same ambiguity creates challenges for automated systems. As a result, simple rule-based natural language processing (NLP) approaches often struggle, highlighting the need for more advanced methods capable of contextual reasoning to reliably structure pathology data at scale.

Can you tell us about the framework you created in this study?

GR: We developed a three-stage, reasoning-based framework designed to operate more like a pathology workflow than a traditional black-box algorithm.

In the first stage, multiple locally deployed large language models (LLMs) independently extract key variables – such as site, histology, grade, stage, laterality, and behavior – from the full report text. Each model also provides an explanation for why it selected each value.

In the second stage, separate reasoning models review these outputs against the original report to determine whether each prediction is supported by the source text. In the final stage, a consensus model adjudicates across the predictions and evaluations to generate a single structured output, along with an auditable rationale. The aim is not only automation, but also transparency and traceability.

Your approach uses multiple locally deployed LLMs and then applies “consensus-based reasoning.” Why did you build it this way instead of relying on one model?

GR: One of the clearest findings from our study was that no single model performs best across all variables, organs, and reporting styles. Some models perform better at extracting histology, while others are stronger at identifying staging information, and performance can vary depending on organ-specific terminology.

Rather than trying to identify a single “best” model, we incorporated that variability into the design. By combining outputs from multiple models and resolving disagreements through structured reasoning, the system reduces the risk of hallucinations and fragile errors. This approach mirrors how consensus is often reached in pathology practice and allows model diversity to become a strength rather than a limitation.

What did you find when you tested the system across reports from The Cancer Genome Atlas (TCGA) from multiple organ systems?

GR: Testing on more than 6,000 TCGA reports across 10 organ systems showed that the framework generalizes well across multiple cancer types. Variables such as histology, site, and behavior were extracted with consistently high accuracy, even when documentation styles varied substantially.

An important observation was that performance differences were driven more by the specific variable and organ context than by the dataset itself. This suggests that generalization is achievable, but only when models are evaluated and adjudicated in a context-aware manner rather than assuming uniform performance across all settings.

What did you observe when you tested the framework using real-world reports from Moffitt Cancer Center?

MB: Real-world reports from Moffitt reflect how pathology is practiced in routine clinical settings and therefore introduce additional complexity compared with curated public datasets such as TCGA. Institutional reports often contain shorthand language, legacy formatting, variable section headers, and information distributed across the main report and subsequent addenda. Biomarker results, in particular, are frequently embedded within narrative text or referenced in separate molecular reports rather than presented in a standardized format.

GR: Despite these complexities, we observed high extraction accuracy for standard diagnostic variables in the Moffitt cohort – comparable to, and in some cases slightly higher than, results seen in TCGA. This likely reflects greater internal consistency within a single institution’s reporting practices. At the same time, challenges with staging nuances and biomarker extraction highlight the importance of evaluating AI systems on real clinical data, not only curated public datasets, when they are intended for routine clinical use.

Biomarker extraction was more challenging than standard diagnostic fields. Why are biomarkers harder for AI to pull reliably from pathology reports?

MB: Biomarkers are documented differently from core diagnostic elements such as histology or site. They are often reported across multiple sections of a pathology report – within comments, ancillary studies, addenda, tables, or separate molecular reports – and may involve multiple specimens, assays, or time points. The language used is also less standardized, with results expressed in qualitative, quantitative, or interpretive terms depending on the test.

GR: In our study, both automated evaluation and expert review showed that biomarker extraction often requires synthesizing information from multiple parts of the report and interpreting assay-specific context. Even for experienced human reviewers, this process can be complex. As a result, biomarkers present a greater reasoning challenge for AI systems than more consistently reported diagnostic fields, which likely explains the lower extraction accuracy we observed.

How could structured extraction like this support precision medicine?

MB: Precision medicine depends on having the right diagnostic and molecular information available at the right time. Staging, histology, and biomarker status directly influence treatment selection, yet much of this information still lives in free-text pathology reports and must be manually abstracted before it can be used downstream.

GR: Structured extraction can help reduce that gap. By converting narrative pathology reports into structured, reviewable data, key details – such as stage or biomarker status – can become available earlier for clinical decision-making, care pathways, and multidisciplinary discussions. Importantly, the approach we describe preserves transparency by attaching a rationale to each extracted value, allowing clinicians and pathologists to verify accuracy rather than rely on opaque outputs.

In practical terms, this supports more timely and consistent use of pathology data across precision oncology workflows while keeping expert clinical judgment central to treatment decisions.

Where do you see this kind of tool helping most with clinical trial enrollment?

MB: Clinical trial enrollment is often limited not by a lack of eligible patients but by the difficulty of identifying them efficiently. Many trials require highly specific pathology criteria – such as particular histologic subtypes, staging details, or molecular alterations – that are documented in narrative reports and typically require time-consuming manual review.

Tools designed to structure pathology report data may help address this challenge, particularly during the early screening phase. By extracting key diagnostic variables and biomarkers at scale, such systems can flag patients who may meet core eligibility criteria across large clinical populations. Because the outputs include structured data with supporting rationale, trial teams and pathologists can verify whether a case qualifies without re-abstracting the entire report.

In practice, this approach could shift trial matching from a largely manual, retrospective process toward a more proactive and systematic one. That might help clinical teams identify potentially eligible patients earlier while maintaining expert oversight and clinical judgment.

Looking ahead, what would it take for tools like this to become routine in pathology workflows?

GR: Several elements would need to come together, and accuracy is only one of them. First, integration with existing laboratory information systems (LIS) is essential. Pathologists should not need to leave their usual reporting environment to use AI tools; instead, structured outputs should flow directly into LIS fields or synoptic templates, supporting rather than disrupting established workflows.

Second, robust validation standards are critical. Pathology is a high-stakes field, so these systems must be rigorously evaluated across different organ systems, variables, and real-world reporting styles. Clear performance metrics and transparent documentation of limitations are essential. Equally important is transparency: extracted values should be accompanied by clear rationales so that pathologists can understand and verify how conclusions were reached.

Finally, deployment must align with clinical and regulatory realities, including data privacy and governance. Locally deployed models that operate within institutional firewalls, combined with human-in-the-loop oversight, are more likely to gain trust and adoption. When AI is positioned as an assistive tool that improves efficiency and consistency – rather than replacing expert judgment – it becomes easier to integrate into routine pathology practice.

What might influence clinical adoption of tools like this?

MB: One of the broader takeaways from this work is that accuracy alone is not enough for AI to be useful in pathology. What matters just as much is whether a system aligns with how pathologists review, interpret, and take responsibility for diagnostic information.

GR: By designing the framework around consensus, justification, and auditability, the goal was to mirror real pathology practice rather than impose a purely computational approach. This alignment helps make such tools more trustworthy and increases the likelihood of adoption. Ultimately, the aim is not to replace pathologists, but to provide scalable, transparent tools that help unlock the clinical and research value already contained in pathology reports.

About the Author(s)

Jessica Allerton

Deputy Editor, The Pathologist

AI Tackles Pathology Report Complexity

A reasoning-based system aims to extract staging, histology, and biomarkers from narrative pathology reports

Why is it so difficult to turn surgical pathology reports into clean, structured data that labs and health systems can actually use?

Can you tell us about the framework you created in this study?

Your approach uses multiple locally deployed LLMs and then applies “consensus-based reasoning.” Why did you build it this way instead of relying on one model?

What did you find when you tested the system across reports from The Cancer Genome Atlas (TCGA) from multiple organ systems?

What did you observe when you tested the framework using real-world reports from Moffitt Cancer Center?

Biomarker extraction was more challenging than standard diagnostic fields. Why are biomarkers harder for AI to pull reliably from pathology reports?

How could structured extraction like this support precision medicine?

Where do you see this kind of tool helping most with clinical trial enrollment?

Looking ahead, what would it take for tools like this to become routine in pathology workflows?

What might influence clinical adoption of tools like this?

About the Author(s)

Jessica Allerton

Explore More in Pathology

Recommended

Breathing New Life into Diagnostics

Opening a Window into Brain Trauma

Molecular Spectacular

Cracking Colon Cancer

Explore

Featured Topics

Issues

Career Development

Educational Resources

Events

People & Profiles

AI Tackles Pathology Report Complexity

A reasoning-based system aims to extract staging, histology, and biomarkers from narrative pathology reports

Why is it so difficult to turn surgical pathology reports into clean, structured data that labs and health systems can actually use?

Can you tell us about the framework you created in this study?

Your approach uses multiple locally deployed LLMs and then applies “consensus-based reasoning.” Why did you build it this way instead of relying on one model?

What did you find when you tested the system across reports from The Cancer Genome Atlas (TCGA) from multiple organ systems?

What did you observe when you tested the framework using real-world reports from Moffitt Cancer Center?

Biomarker extraction was more challenging than standard diagnostic fields. Why are biomarkers harder for AI to pull reliably from pathology reports?

How could structured extraction like this support precision medicine?

Where do you see this kind of tool helping most with clinical trial enrollment?

Looking ahead, what would it take for tools like this to become routine in pathology workflows?

What might influence clinical adoption of tools like this?

Newsletters

About the Author(s)

Jessica Allerton

Explore More in Pathology

Recommended

Related Content

Breathing New Life into Diagnostics

Opening a Window into Brain Trauma

Molecular Spectacular

Cracking Colon Cancer