Tıp Eğitiminde Senaryo Uygunluk Testi Üretmek İçin Büyük Dil Modellerinin Kullanılması: Chatgpt ve Claude


Kıyak Y. S., Emekli E.

Uluslararası Türk Dünyası Eğitim Bilimleri Kongresi, Ankara, Türkiye, 12 - 13 Aralık 2024, ss.1, (Özet Bildiri)

  • Yayın Türü: Bildiri / Özet Bildiri
  • Basıldığı Şehir: Ankara
  • Basıldığı Ülke: Türkiye
  • Sayfa Sayıları: ss.1
  • Eskişehir Osmangazi Üniversitesi Adresli: Evet

Özet

This study aimed to evaluate the quality of Script Concordance Test (SCT) items generated by artificial intelligence (AI) models, specifically ChatGPT-4 and Claude 3 (Sonnet), through an expert panel evaluation. Using a complex prompt, SCT items in abdominal radiology were generated with these large language models in April 2024. A panel of 16 radiologists, blind to the items' AI origin, independently answered each item and rated their quality based on 12 specific criteria. Data collection involved expert responses, with analysis incorporating descriptive statistics, bar chart comparisons, and a heatmap illustrating performance across quality criteria. Findings indicated that chatbot-generated SCT items predominantly assessed clinical reasoning rather than simple factual recall (ChatGPT: 92.5%, Claude: 85.0%). The heatmap showed general acceptability, with quality ratings of 71.77% for ChatGPT and 64.23% for Claude. Bar chart comparisons revealed that 73.33% of ChatGPT and 53.33% of Claude’s questions met acceptable standards. These results suggest that LLMs can support medical educators by reducing the effort involved in SCT item development, although expert review and refinement of AIgenerated items remain essential. The custom GPT tool, “Script Concordance Test Generator” (hbps://chatgpt.com/g/g-RlzW5xdc1-script-concordance-test-generator), offers a streamlined approach for SCT item development.

Keywords: Automatic item generation, Artificial intelligence, Script concordance test, Medical education.

Theme: Assessment and Evaluation in Education, Higher Education.