Multimodal Models for Skin Cancer Classification using Clinical Free Text and Dermatoscopic Images

Matthew Watson, Thomas Winterbottom, Thomas Hudson, Benedict Jones, Hubert P. H. Shum, Amir Atapour-Abarghouei, Toby P. Breckon, James Harmsworth King and Noura Al Moubayed
Communications Medicine, 2026

Multimodal Models for Skin Cancer Classification using Clinical Free Text and Dermatoscopic Images

Abstract

Background: Skin cancer is one of the most prevalent cancers globally, with early detection critical to ensure reduced mortality risk. To aid early detection, machine learning (ML) skin cancer detection models have been proposed, currently with a focus on dermatoscopic imaging only. However, freetext may provide extra diagnostic information that is not present in images alone.
Methods: We constructed a multimodal dataset comprising 5,481 dermatoscopic images from 4,538 patients, including patient metadata and clinical notes, with binary labels (benign vs. malignant, 7% malignant). To assess and mitigate bias from leading language, we developed a clinical text preprocessing pipeline combining regular expressions and large language models, enabling multiple levels of filtering. We train multimodal ML models on this dataset to explore the effect of freetext on model performance.
Results: Our results show that incorporating unfiltered text significantly improves classification performance (0.970 AUROC) compared to visual data alone (0.909 AUROC); even with leading language removed, performance gains persist (0.948 AUROC).
Conclusions: This work benchmarks clinical freetext inclusion in skin lesion classification, demonstrating that clinical text contributes predictive value beyond that available in images alone. The model’s high performance on unfiltered clinical text highlights the high levels of bias, and possible shortcutting, present in this text which may make it unsuitable for inclusion in some ML models. By systematically filtering clinical notes via our proposed technique, we show that multimodal models retain improved accuracy while reducing bias. These results provide practical guidance for integrating clinical text into real-world skin cancer detection systems and establish a foundation for future multimodal research in dermatology.


Downloads


YouTube


Cite This Research

Plain Text

Matthew Watson, Thomas Winterbottom, Thomas Hudson, Benedict Jones, Hubert P. H. Shum, Amir Atapour-Abarghouei, Toby P. Breckon, James Harmsworth King and Noura Al Moubayed, "Multimodal Models for Skin Cancer Classification using Clinical Free Text and Dermatoscopic Images," Communications Medicine, 2026.

BibTeX

@article{watson26multimodal,
 author={Watson, Matthew and Winterbottom, Thomas and Hudson, Thomas and Jones, Benedict and Shum, Hubert P. H. and Atapour-Abarghouei, Amir and Breckon, Toby P. and King, James Harmsworth and Moubayed, Noura Al},
 journal={Communications Medicine},
 title={Multimodal Models for Skin Cancer Classification using Clinical Free Text and Dermatoscopic Images},
 year={2026},
}

RIS

TY  - JOUR
AU  - Watson, Matthew
AU  - Winterbottom, Thomas
AU  - Hudson, Thomas
AU  - Jones, Benedict
AU  - Shum, Hubert P. H.
AU  - Atapour-Abarghouei, Amir
AU  - Breckon, Toby P.
AU  - King, James Harmsworth
AU  - Moubayed, Noura Al
T2  - Communications Medicine
TI  - Multimodal Models for Skin Cancer Classification using Clinical Free Text and Dermatoscopic Images
PY  - 2026
ER  - 


Supporting Grants

Innovate UK
Skin Lesion Classification with Real-World Data
Knowledge Transfer Partnership (Ref: 13457): £143,393, Principal Investigator ()
Received from Innovate UK, UK, 2022-2024
Project Page

Similar Research


HomeGoogle ScholarLinkedInYouTubeGitHubORCIDResearchGateEmail
 
Print