Purpose
Creating high-quality multiple-choice assessments is challenging in undergraduate medical education (UME). The role of teaching faculty is multifaceted, with exam question writing as one of the least developed skills. This skill takes training, and deliberate practice, both of which are difficult as a busy educator. The use of Generative Artificial Intelligence (AI) in medical education is emerging with little data on its use in question writing and assessments. We conducted a pilot study to explore ChatGPT's potential for generating USMLE-style questions from preclinical lecture material and its role as a faculty development tool.
Methods
Thirteen hematology lectures were obtained from course faculty and uploaded to ChatGPT with a standardized prompt to generate 5 USMLE-style questions per lecture. The questions, copied unedited, from ChatGPT, were reviewed by a team of four medical educators (two clinicians, two basic scientists) using a rubric to assess the criteria of: (1) appropriate clinical vignette structure, (2) appropriate learner level, and (3) alignment with session learning objectives.
Results
The scored rubrics were reviewed for agreement among the faculty. For a question to not meet criteria, it had to be scored as such by at least 3 of the 4 reviewers. The questions met the criteria of (1) appropriate clinical vignette structure 100% of the time (65/65), (2) appropriate learner level 95% (62/65), and (3) alignment with session learning objectives 97% (63/65). Only one question (1.5%) failed to meet at least two criteria.
Conclusions
This pilot study showed that 95% of ChatGPT-generated questions were high-quality, supporting its potential as a faculty development tool. We envision expanding its use across curricula to assist faculty in creating USMLE-style questions by providing AI-generated scaffolding, reducing time burdens while focusing on subject expertise.