Multimodal Deep Learning Model Sets New Benchmarks in Medical Imaging

Researchers have developed a deep learning framework that integrates radiology and pathology imaging to support classification tasks in diagnostic imaging. The framework combines the Adaptive Multi-Resolution Imaging Network (AMRI-Net) with the Explainable Domain-Adaptive Learning (EDAL) strategy in a unified architecture. According to the study published in Frontiers in Medicine, this approach is designed to address the variability of multimodal medical data and improve the interpretability of artificial intelligence (AI) models used in image classification.

The framework was evaluated using four publicly available datasets: ISIC, HAM10000, OCT2017, and Brain MRI. Transformer-based backbone architectures and a cross-entropy loss function were used to train the classification model. AMRI-Net applies a multi-resolution feature extraction method with attention-guided fusion to process both local and global imaging patterns. EDAL introduces domain generalization techniques, including maximum mean discrepancy and adversarial domain alignment, and uses attention-based Grad-CAM to support model interpretability by highlighting regions associated with the model’s predictions.

In the ISIC dataset, which includes dermoscopic images, the framework achieved a classification accuracy of 95 percent and an F1 (precision and recall) score of 95 percent. These results by Lanting He of the School of Optoelectronics, Beijing Institute of Technology, and colleagues were higher than those obtained using baseline models such as BLIP (94 percent accuracy) and Vision Transformer (ViT). For the HAM10000 dataset, the framework reached an accuracy of 88 percent and an F1 score of 91 percent. On the OCT2017 dataset, the model achieved 89 percent accuracy, 96 percent recall, and an F1 score of 87 percent. For the Brain MRI dataset, the reported accuracy was 85 percent, with an F1 score of 86 percent.

An ablation analysis indicated that removing either AMRI-Net or EDAL from the framework resulted in lower classification metrics, suggesting each component contributed to overall performance. On the ISIC dataset, interpretability evaluations produced an Intersection over Union of 0.64 and pointing game accuracy of 81 percent, which were higher than the corresponding values from ResNet-CAM and ViT-based attention mechanisms.

The study findings indicate that the framework performs consistently across multiple datasets and includes interpretability features that may be applicable in diagnostic imaging settings with varying resource availability.

Multimodal Deep Learning Model Sets New Benchmarks in Medical Imaging

New framework integrates radiology and pathology imaging to support classification tasks in diagnostic imaging

Explore More in Pathology

Recommended

Global Referral

Cracking Colon Cancer

The (Pathology) IT Crowd?

Defining the Next Generation of NGS

Explore

Featured Topics

Issues

Career Development

Educational Resources

Events

People & Profiles

Multimodal Deep Learning Model Sets New Benchmarks in Medical Imaging

New framework integrates radiology and pathology imaging to support classification tasks in diagnostic imaging

Newsletters

Explore More in Pathology

Recommended

Related Content

Global Referral

Cracking Colon Cancer

The (Pathology) IT Crowd?

Defining the Next Generation of NGS