Using Large Language Models to Generate Script Concordance Test in Medical Education: ChatGPT and Claude


Creative Commons License

Kıyak Y. S., Emekli E.

REVISTA ESPAÑOLA DE EDUCACIÓN MÉDICA, cilt.2025, sa.1, ss.1-8, 2024 (Hakemli Dergi)

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 2025 Sayı: 1
  • Basım Tarihi: 2024
  • Doi Numarası: 10.6018/edumed.636331
  • Dergi Adı: REVISTA ESPAÑOLA DE EDUCACIÓN MÉDICA
  • Derginin Tarandığı İndeksler: Directory of Open Access Journals, DIALNET
  • Sayfa Sayıları: ss.1-8
  • Eskişehir Osmangazi Üniversitesi Adresli: Evet

Özet

We aimed to determine the quality of AI-generated (ChatGPT-4 and Claude 3) Script ConcordanceTest (SCT) items through an expert panel.We generated SCT items on abdominal radiology usinga complex prompt in large language model (LLM) chatbots (ChatGPT-4 and Claude 3 (Sonnet) inApril 2024) and evaluated the items’ quality through an expert panel of 16 radiologists. Expertpanel, which was blind to the origin of the items provided without modifications, independentlyanswered   each   item   and   assessed   them   using   12   quality   indicators.   Data   analysis   includeddescriptive statistics, bar charts to compare responses against accepted forms, and a heatmap toshow   performance   in   terms   of   the   quality   indicators.   SCT   items   generated   by   chatbots   assessclinical reasoning rather than only factual recall (ChatGPT: 92.50%, Claude: 85.00%). The heatmapindicated that the items were generally acceptable, with most responses favorable across qualityindicators (ChatGPT: 71.77%, Claude: 64.23%). The comparison of the bar charts with acceptableand   unacceptable   forms   revealed   that   73.33%   and   53.33%   of   the   questions   in   the   items   can   beconsidered acceptable, respectively, for ChatGPT and Claude. The use of LLMs to generate SCTitems can be helpful for medical educators by reducing the required time and effort. Although theprompt provides a good starting point, it remains crucial to review and revise AI-generated SCTitems   before   educational   use.   The   prompt   and   the   custom   GPT,   “Script   Concordance   TestGenerator”,   available   at  https://chatgpt.com/g/g-RlzW5xdc1-script-concordance-test-generator,can streamline SCT item development.