Comparison of Automatic Item Generation Methods in the Assessment of Clinical Reasoning Skills


Creative Commons License

Emekli E., Karahan B. N.

REVISTA ESPAÑOLA DE EDUCACIÓN MÉDICA, cilt.2025, ss.1-12, 2024 (Hakemli Dergi)

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 2025
  • Basım Tarihi: 2024
  • Doi Numarası: 10.6018/edumed.637221
  • Dergi Adı: REVISTA ESPAÑOLA DE EDUCACIÓN MÉDICA
  • Derginin Tarandığı İndeksler: Directory of Open Access Journals, DIALNET
  • Sayfa Sayıları: ss.1-12
  • Eskişehir Osmangazi Üniversitesi Adresli: Evet

Özet

The use of automatic item generation (AIG) methods offers potential for assessing clinical reasoning (CR) skills in medical education, a critical skill combining intuitive and analytical thinking. In preclinical education, these skills are commonly evaluated through written exams and case-based multiple-choice questions (MCQs), which are widely used due to the high number of students, ease of standardization, and quick evaluation. This research generated CR-focused questions for medical exams using two primary AIG methods: template-based and non-template-based (using AI tools like ChatGPT for a flexible approach). A total of 18 questions were produced on ordering radiologic investigations for abdominal emergencies, alongside faculty-developed questions used in medical exams for comparison. Experienced radiologists evaluated the questions based on clarity, clinical relevance, and effectiveness in measuring CR skills. Results showed that ChatGPT-generated questions measured CR skills with an 84.52% success rate, faculty-developed questions with 82.14%, and template-based questions with 78.57%, indicating that both AIG methods are effective in CR assessment, with ChatGPT performing slightly better. Both AIG methods received high ratings for clarity and clinical suitability, showing promise in producing effective CR-assessing questions comparable to, and in some cases surpassing, faculty-developed questions. While template-based AIG is effective, it requires more time and effort, suggesting that both methods may offer time-saving potential in exam preparation for educators.