Evaluation of deep learning-based segmentation models for carotid artery calcification detection in panoramic radiographs


Yılmaz B. G., Altun S., BAYRAKDAR İ. Ş.

Oral Radiology, 2025 (SCI-Expanded) identifier identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Basım Tarihi: 2025
  • Doi Numarası: 10.1007/s11282-025-00858-7
  • Dergi Adı: Oral Radiology
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, CINAHL, MEDLINE
  • Anahtar Kelimeler: Artificial intelligence, Carotid artery calcification, Deep learning, Panoramic radiography, YOLO
  • Eskişehir Osmangazi Üniversitesi Adresli: Evet

Özet

Objectives: The primary objective of this study is to evaluate the effectiveness of artificial intelligence-assisted segmentation methods in detecting carotid artery calcification (CAC) in panoramic radiographs and to compare the performance of different YOLO models: YOLOv5x-seg, YOLOv8x-seg, and YOLOv11x-seg. Additionally, the study aims to investigate the association between patient gender and the presence of CAC, as part of a broader epidemiological analysis. Methods: In this study, 30,883 panoramic radiographs were scanned. Annotations were made on 652 radiographs exhibiting features consistent with CAC, totaling 1,086 annotations. Deep learning-based analysis was conducted using three distinct YOLO segmentation models. The performance of these models was assessed using metrics such as precision, accuracy, and F1 score. Results: The YOLOv5x-seg model exhibited a balanced performance with a precision, sensitivity, and F1 score of 84.62% each. The YOLOv8x-seg model demonstrated higher sensitivity at 88.46%, albeit with a slightly higher false positive rate, evidenced by a precision of 78.63%. The YOLOv11x-seg model achieved the highest precision at 93.41%, an F1 score of 87.18%, and a sensitivity of 81.73%. Conclusions: AI-based segmentation models utilizing YOLO algorithms can be considered reliable tools for detecting CAC in panoramic radiographs. These models display promising performance appropriate for clinical applications; however, further research with larger and more diverse datasets is required to verify their generalizability and efficacy.