Session Details: IAMSE 2026 Annual Conference

Name

Generative artificial intelligence is going to grade my exam? Can we talk about that?

Date & Time

Monday, June 8, 2026, 1:49 PM - 2:09 PM

Location Name

Oglethorpe H

Speakers

Doreen Olvet - Donald & Barbara Zucker School of Medicine at Hofstra/Northwell

Authors

Doreen M. Olvet, Donald and Barbara Zucker School of Medicine at Hofstra/Northwell Tracy B. Fulton, University of California at San Francisco School of Medicine Marieke Kruidering, University of California at San Francisco School of Medicine Bao Truong, University of California at San Francisco School of Medicine Kumiko Endo, Med2Lab, Inc Robert Lucito, Donald and Barbara Zucker School of Medicine at Hofstra/Northwell Joanne M. Willey, Donald and Barbara Zucker School of Medicine at Hofstra/Northwell

Presentation Topic(s)

Technology and Innovation

Description

PURPOSE: Automated grading by generative artificial intelligence (AI)
offers a potential solution to the numerous barriers that limit the adoption
of open-ended questions (OEQs) on exams. Stakeholder acceptability, a key
element of Bowen’s feasibility framework, must be explored if this innovation
is to be adopted. The objective of this qualitative study was to explore
pre-clerkship medical students’ views on the use of generative artificial
intelligence (AI) to score summative (OEQ) exams.
METHODS: Focus groups were conducted with first- and second-year medical
students (N=28) who attend medical schools where OEQs constitute
end-of-course summative exams. Students were asked to describe how they use
generative AI, and to discuss their feelings about generative AI grading
their exams, including the benefits and drawbacks. Transcripts were analyzed
using thematic analysis.
RESULTS: Although medical students are using generative AI to explain
complex medical concepts and to identify knowledge gaps, they expressed mixed
feelings about generative AI grading. Students identified several benefits
including getting grades faster, increasing standardization in grading,
alleviating faculty time, and receiving more personalized feedback. However,
they expressed concerns about the accuracy and reliability of generative AI
citing AI error-proneness and the nuance needed to assess clinical reasoning.
They expressed concerns about reduced input from faculty experts and
supported human-in-the loop oversight and transparency.
CONCLUSIONS: Medical students recognize the benefits and risks of using
generative AI to grade their exams and are open to the idea provided there is
transparency and human oversight. The sample was limited to two institutions
that currently utilize OEQ assessments, which may not generalize to medical
students assessed using other question formats. Selection bias and observer
bias could also impact focus group results. Nonetheless, using generative AI
can make grading feasible for educators interested in incorporating OEQ
exams. Faculty perceptions will be explored in the future to triangulate
stakeholder views.