A deep learning model, developed by researchers at the University of Chicago, has achieved up to 91 percent accuracy in classifying rare thymic tumors, offering a potential tool to improve diagnostic consistency in routine pathology.
Thymic epithelial tumors are uncommon and difficult to classify, with studies showing that diagnoses can change in up to 57 percent of cases after expert review. This variability can affect treatment decisions, particularly when distinguishing less aggressive thymomas from thymic carcinomas, which require different therapeutic approaches.
The study, published in Annals of Oncology, evaluated an artificial intelligence model trained on digitized hematoxylin–eosin slides. When tested on an independent dataset, the system achieved 77.7 percent accuracy across six World Health Organization subtypes and 91.1 percent accuracy when tumors were grouped into clinically relevant categories.
For diagnostics, the most notable finding was the model’s ability to reliably identify thymic carcinoma. It showed 100 percent sensitivity and 94.6 percent accuracy for this subtype, reducing the risk of misclassifying aggressive disease as less severe forms.
Lead researcher, Marina Garassino, said, “Basically, we created a tool that – in the hands of a non-expert pathologist – is able to properly diagnose 100 percent of thymic carcinomas and outperform non-expert diagnoses.”
The model works by analyzing whole-slide images and highlighting regions that contribute most to its classification. It can visually indicate which tissue areas support a diagnosis, potentially aiding pathologists in reviewing challenging cases.
Importantly, most classification errors occurred within the same treatment group. Around 60 percent of misclassifications did not affect clinical management, suggesting that the model aligns with decision-making pathways even when subtype distinctions are difficult.
The system also performed consistently across both surgical specimens and small biopsy samples, with similar accuracy despite common artifacts such as tissue distortion. This supports its potential use in real-world diagnostic workflows, where sample quality and size can vary.
Unlike traditional approaches that rely on additional staining, the model uses routine slides and can run on standard hardware within minutes. This could expand access to specialist-level diagnostic support, particularly in laboratories without dedicated thoracic pathology expertise.
“In a larger population, harmonizing these steps is the biggest challenge,” Garassino said. “So, in the future, we plan to expand the algorithm so that it can correct for such differences, which will make the tool even more widely usable.”
