David C. Bury, Mercer University School of Medicine
Purpose
Delivering bad news is a critical skill for physicians, requiring a blend of empathy, clear communication, and structure. This study explored generative AI models—ChatGPT and Gemini—as tools for providing formative feedback in teaching these skills.
Methods
We reviewed transcripts from 20 second-year medical student–standardized patient encounters where students delivered bad news using the SPIKES protocol across four clinical scenarios: death notification, dementia diagnosis, persistent vegetative state disclosure, and open-heart surgery recommendation. Faculty, ChatGPT, and Gemini independently evaluated each transcript across 12 subcompetencies within domains of organization, empathy, medical knowledge, and communication. AI models were instructed to generate written feedback with suggestions for improvement.
Results
Competency scores were analyzed using Friedman and Wilcoxon signed-rank tests. ChatGPT differed significantly from both faculty (p=0.002) and Gemini (p<0.0001) in assessing logical flow and SPIKES adherence. ChatGPT also differed from faculty (p=0.0002) and Gemini (p=0.0234) in identifying emotions, and from Gemini in responding to emotional shifts (p=0.0313). ChatGPT further differed from both faculty (p=0.0015) and Gemini (p=0.0005) in evaluating how students checked patient understanding. No significant differences were found in assessing students’ recognition of patient perception, acknowledgment of difficulty, or clarification of misunderstandings. Both AI models consistently produced detailed, actionable feedback.
Conclusions
Our findings showed significant discrepancies between ChatGPT and Gemini, and Gemini aligned more closely with faculty evaluations than ChatGPT. Both AI models efficiently generated personalized formative feedback. However, due to the nuanced and emotionally complex nature of delivering bad news, faculty oversight remains essential for a comprehensive and effective learning experience.