With machine learning’s ability to recognize cancerous tissue on digital histology images well established, researchers are beginning to apply that power to other diseases.
A team of researchers at the University of Cambridge has been investigating whether a machine learning tool might be able to improve the efficiency of celiac disease diagnosis – a process that is notoriously slow.
Their research, published in the New England Journal of Medicine AI, presents a machine learning model that can correctly identify whether a patient has celiac disease or not in 97 percent of cases.
Could this be the route to faster diagnoses, reduced waiting lists, and relief of pressure on labs? Here, Florian Jaeckle, Visiting Research Associate at the University of Cambridge’s Department of Pathology, explains the research and its implications.
What inspired this research?
Our group, led by Elizabeth Soilleux, first started working on celiac disease after she and her children were diagnosed with the condition more than seven years ago.
The disease is diagnosed via duodenal biopsies, which are often deprioritized due to the duodenum’s low malignancy rate . With the ongoing shortage of pathologists, this has led to significant delays – often several weeks – in reporting duodenal biopsies, even in countries like the UK.
Alongside that is the problem of low diagnostic consistency; in a study we conducted, inter-pathologist agreement when diagnosing celiac disease was only around 80 percent, highlighting the subjectivity and variability in current diagnostic practices.
Professor Soilleux’s personal experience, combined with these pressing clinical needs, inspired us to develop AI tools for celiac disease diagnosis.
How was the machine learning model trained and evaluated?
Our model is trained to detect the presence or absence of celiac disease, which aligns with many pathologists’ approach to the practical diagnosis. We intentionally chose not to focus on the Marsh classification, which, while valuable in research, is not always consistently applied in routine clinical diagnosis.
The AI model was trained using more than 3,300 scanned biopsy slides sourced from four hospitals, scanned on five different devices. We used the original clinical diagnoses made by reporting pathologists as the ground truth during training.
For evaluation, the model was tested on over 600 cases from an entirely separate, previously unseen hospital to assess its real-world generalizability.
What were the key findings of the study?
In an inter-observer study using biopsy slides from a previously unseen hospital, we compared the AI’s diagnostic performance with that of four experienced pathologists. We found that any given pathologist was just as likely to agree with the AI as they were with another human pathologist. This suggests that the AI achieves a level of diagnostic accuracy on par with experienced professionals.
We further found that the model performed consistently across adult patient subgroups, regardless of age or sex.
What impact could these findings have on celiac disease diagnostics?
With an overall accuracy of 97 percent, alongside specificity and sensitivity both exceeding 95 percent, our AI tool demonstrated pathologist-level performance. These metrics, combined with our inter-observer findings, indicate that the tool could significantly reduce diagnostic variability and improve reporting consistency and turnaround times across pathology services.
How do you envision this tool being used in clinical workflows?
Initially, we see this tool functioning as a decision-support aid, helping pathologists improve diagnostic accuracy and consistency when assessing for celiac disease.
In the longer term, our team is actively developing additional AI tools designed to pre-screen and filter out a proportion of normal duodenal biopsies. By integrating these with our current model, we aim to create a more comprehensive pipeline that prioritizes abnormal and ambiguous cases for pathologist review.
This integrated approach has the potential to significantly reduce reporting backlogs – especially in resource-limited or high-throughput settings – while maintaining diagnostic safety and accuracy.
What’s next for this research?
Our group is planning on running stakeholder engagement studies to gather insights and experiences from key experts, including pathologists, who would like to shape the development of AI in pathology.
Readers of The Pathologist are invited to participate by completing our questionnaire.