AI in Health Professions Education: IAMSE 2023 Virtual Forum

Can Medical Students Use Open AI's ChatGPT as a Self-Directed Tool for Problem-Based Learning?

Presenting Author: Samantha Wehsener - St. George's University, School of Medicine, Grenada
Co-Authors: Vanad Mousakhani - Frank H. Netter MD, School of Medicine
Vineeta Naraine-Ramnauth - St. George's University, School of Medicine, Grenada
Serine Torosian - St. George's University, School of Medicine, Grenada
Gabrielle Walcott-Bedeau - St. George's University, School of Medicine, Grenada

Studies show that more innovative approaches are needed to provide a cognitive framework for clinical reasoning in medical education. Can Artificial Intelligence (AI) technologies, like Chat Generative Pre-Trained Transformer (Chat GPT), be integrated into medical education as a self-directed learning tool for the generation of case studies for problem-based learning?

Objectives
The focus of this pilot study will be to:

1. determine ways preclinical medical students can use Open AI Chat GPT to create case studies for self-directed problem-based learning.
2. determine the clinical accuracy and relevance of a sample clinical case study created by the Chat GPT trial.
3. determine whether medical students, after reviewing the sample clinical case study and the interface instructions, would be willing to use Chat GPT during their medical education.

Method
In May 2023, a clinical case study on myasthenia gravis was created using OpenAI's Chat GPT version 3.5. The case study along with a post-session survey was distributed to preclinical medical students at St. George's University.

Results
will be analyzed using SPSS v.29. Descriptive analysis and Chi-square testing will be done at a 95% confidence interval to determine previous use of Chat GPT, clinical relevance of the case study generated by Chat GPT, and interest in using Chat GPT as a self-directed learning tool. The findings from this study will provide insight on students' perceptions whether Chat GPT can be helpful during learning, as well as viable ways for instructors to utilize and implement Chat GPT within the medical education curriculum.

Conclusion
When preclinical medical students receive detailed instructions on how to interact with Open AI's Chat GPT, they can use the interface as a self-directed learning tool for problem-based learning and clinical reasoning.

Categories:
Artificial Intelligence in Health Professions Education

ChatGPT: Your Teaching Assistant for Medical Education?

Presenting Author: Aditi Kesari - The University of Tennessee Health Science Center

Background
With the advent of multiple artificial intelligence (AI) platforms in recent times, it has become imperative for medical educators to recognize the applications as well as limitations of these tools in medical education. This study focuses on investigating the applications and limitations of ChatGPT, an AI-enabled tool, in biochemistry education in the medical curriculum.

Methods
ChatGPT, the AI-enabled tool developed by Open AI, was used for this study. To find the capabilities of this platform in medical education, prompts related to biochemistry topics covered in the medical curriculum were created. The responses generated by ChatGPT were then analyzed to determine the utility of ChatGPT in medical education.

Results
Multiple applications of ChatGPT in medical education were identified. Some of these included creating templates for curriculum objectives and session learning objectives, building case-based scenarios, generating formative/ summative questions with rationales, building concept maps, creating flashcards, and summarizing complex concepts. While ChatGPT could be a helpful tool, some inaccuracies, and shortcomings did come to light while generating these resources, which highlights the importance of critical evaluation of the material generated by these tools.

Conclusions
ChatGPT has the potential to be a helpful tool for both faculty and students with its various capabilities. It is, however, crucial to emphasize the importance of critical appraisal of information generated using these AI-enabled tools, while engaging in the responsible and transparent use of these technologies.

Categories:
Artificial Intelligence in Health Professions Education

Embracing Change: Chat GPT Generated Script Concordance Tests for Clinical Reasoning Assessment

Presenting Author: Daniel Novak - University of California Riverside
Co-Authors: Ian V.J. Murray - Alice L. Walton School of Medicine
Caitlin Wardlaw - University of Arkansas Medical School

Medical education must adapt to challenging times, including robust assessments of clinical reasoning that is founded in the best available science. Script Concordance Testing (SCT) is a process that assesses clinical reasoning skills by probing the illness scripts of healthcare professionals by comparing novices' scores to that of a panel of experts. It also serves as a learning tool, as it prompts learners to reevaluate initial diagnostic hypotheses when challenged with new additional ambiguous clinical information. However, adoption of SCT remains limited, with barriers such as difficulty in creating high-quality scenarios based on illness scripts, a panel of 10-20 experts, and that 20% of questions are discarded post-exam test. In this study, we demonstrate the development of a modified SCT (mSCT) using the artificial intelligence (AI) program, in which Chat GPT generates Likert scores, estimated probabilities, and metacognitive written justifications.

ChatGPT was trained to generate modified SCTs (mSCT) and include 3 Likert scores, written justification, and an estimation of diagnosis probability. The AI was trained using progressively optimized prompts and a case previously developed by the author. The mSCT format contained a brief case scenario with an initial diagnosis, and to probe illness scripts, presented a related symptom or one from a differential diagnosis. The AI also provided a Likert score and written justification of the answer. We evaluated the generated mSCTs for 10 common diseases using the prompt "write a modified SCT for [insert disease]" focused on [insert discipline, or concept, or add differential disease here]. The mSCT medical information and estimated probability were compared to those generated by a different AI developed to aid medical students with answering USMLE questions (MedQBot https://poe.com/MedQBot).)

These pilot results demonstrate that Chat GPT, from a simple prompt, automatically created complex mSCT for the 10 diseases, providing clinical presentation, patient information, and history, as well as Likert scores and medical justifications. mSCT ambiguity, or intersections between different illness scripts, was achieved by specifying a differential diagnosis or a less likely (-1) Likert score in the prompt. MedQBot provided similar justification and estimated probabilities for the mSCT but struggled to diagnose questions with more ambiguity.

This pilot and novel application of AI lowers barriers to mSCT use, augmenting the creation of complex clinical mSCT with ambiguity and at the same time, providing Likert scoring and medical information justifying the diagnosis. While the AI cannot replace expert question review, it could augment score generation as the symptom-disease probability is known for many diseases. It is acknowledged that real-world mSCT validity and reliability testing more robust mSCT tests encompassing tests and treatment and more diseases and disciplines, are required.

Categories:
Curriculum: Curricular Transformation and Transition
Artificial Intelligence in Health Professions Education

Expert Evaluation of ChatGPT Responses to Clinically Integrated Questions in Medical Physiology

Presenting Author: Jennifer Connolly - Ross University School of Medicine
Co-Authors: Thomas Ferrari - Ross University School of Medicine
Oleksii Hliebov - Ross University School of Medicine

Since its inception in November 2022, there has been growing interest in the potential application of ChatGPT (Chat Generative Pre-Trained Transformer), an artificial intelligence chatbot developed by OpenAI, in medical education. While apparently capable of providing adequate responses to simple recall questions, the ability of ChatGPT to answer higher-order medical physiology questions remains unknown. This pilot study aimed to evaluate the correctness, completeness, and concordance of responses to questions by ChatGPT in the disciplines of cardiovascular, renal, and reproductive physiology as taught to first-year medical students.

A total of 10 clinically integrated multiple-choice questions with appropriate difficulty (p-value between 0.30 to 0.70) and discrimination indices (DI > 0.2) were chosen for each of the three disciplines (n = 30 total). Each question was posed to ChatGPT (version March 23, 2023) in three formats: (1) as an open-ended question, (2) as a multiple-choice question (MCQ), and (3) as an MCQ asking for forced justification of provided answers (n = 90 total responses). The cumulative responses for all three disciplines and formats were analyzed by content experts to assess the model's performance and ability to provide accurate, comprehensive, and cohesive responses.

In renal physiology, 50% of responses were correct. The completeness analysis revealed that 36.7% of responses were deemed comprehensive, while 46.7% were adequate, and 16.7% were incomplete. Lastly, the answers were 46.7% discordant, where discordant is defined as any part of the explanation contradicting itself. In cardiovascular physiology, ChatGPT scored 53.3% correct. The completeness analysis revealed that 70.0% of responses were deemed comprehensive, while 26.7% were adequate, and 3.3% were incomplete. Discordance was 40%. Regarding reproductive physiology, ChatGPT scored 50% correct. The completeness analysis revealed that 23.3% of responses were comprehensive, 26.7% were adequate, and 50% were incomplete. 53.3% of responses were discordant.

These results suggest that ChatGPT does not possess a solid understanding of medical physiology and cannot provide reliable information at this time. While ChatGPT could sometimes provide depth and completeness in the MCQ and forced justification responses, it was often inconsistent (i.e., discordant) and still selected the wrong answer half the time. These findings highlight the danger, at least currently, of using ChatGPT as an educational tool in medical physiology and emphasize the need for further refinement to enhance its performance. However, it may be that large language models like ChatGPT lack the higher-order reasoning skills needed to accurately and consistently interpret and provide reasoning for physiology questions posed in the form of clinically integrated vignettes.

Categories:
Artificial Intelligence in Health Professions Education

Integration of Artificial Intelligence Into a Preclinical Curriculum: A Pilot Study

Presenting Author: Soo Hwan Park - Geisel School of Medicine at Dartmouth
Co-Authors: Connor Bridges - Geisel School of Medicine at Dartmouth
Travis Byrum - Geisel School of Medicine at Dartmouth
Rachael Chacko - Geisel School of Medicine at Dartmouth
Justin Fong - Geisel School of Medicine at Dartmouth
Devina Gonzalez - Geisel School of Medicine at Dartmouth
Roshini Pinto-Powell - Geisel School of Medicine at Dartmouth
Adam Schwendt - Geisel School of Medicine at Dartmouth
Shahin Shahsavari - Geisel School of Medicine at Dartmouth
Thomas Thesen - Geisel School of Medicine at Dartmouth

With increasing availability of big healthcare data, artificial intelligence (AI) has been rapidly introduced into the healthcare system. Although the benefits of AI education for future healthcare providers are increasingly recognized, no clear consensus exists on how to most effectively deliver the digital health curriculum as part of undergraduate medical education. This study investigated the effectiveness of a pilot Digital Health Scholars (DHS) curriculum at the Geisel School of Medicine that integrated AI concepts into UME by paralleling the preclinical systems-blocks and highlighting the translational aspect of foundational algorithms. From September 2022 to March 2023, ten self-selected first-year students enrolled in the curriculum engaged in the four DHS curricular blocks (Immunology, Hematology, Cardiology, and Pulmonology). Each block consisted of a journal club, a live-coding demonstration, and an integration session led by a researcher in that field. Before and after each DHS block, students' confidence in describing the content objectives (high-level data science knowledge, implications, and limitations) was assessed and compared using Mann-Whitney U tests. The course received a mean satisfaction level of 4.29/5. For each DHS block, enrolled students showed significant increases in confidence in explaining the curricular objectives after the respective sessions (Immunology: U=4.5, p=0.030; Hematology: U=1.0, p=0.009; Cardiology: U=4.0, p=0.019; Pulmonology: U=4.0, p=0.030). Our pilot study indicates that an AI curriculum that parallels a medical school's preclinical UME and integrates data science concepts by focusing on a particular algorithm and its research application could enhance learners' confidence in explaining the high-level understanding of AI. Future studies should focus on building on this curricular design with higher enrollment to help identify the most effective form of integration in preparing future healthcare providers for the increasing AI-enhanced healthcare environment and research.

Categories:
Artificial Intelligence in Health Professions Education

Natural Language Processing to Analyze Student Reflections

Presenting Author: Carrie Elzie - University of Texas Health San Antonio
Co-Authors: Krzysztof Rechowicz - Old Dominion University

Background
Natural language processing (NLP), a nascent field of research that uses the application of computational techniques for the analysis and synthesis of text, is a novel way to assess students reflections. Given the pervasiveness of reflections in anatomy, adopting a NLP approach to analyzing reflections could provide a rich source of new information related to students' previously undiscovered experiences and competencies.

Summary Of Work
Health professional students (n=132) enrolled in gross anatomy were instructed to provide reflections about themselves and their donors. At the beginning of the course, students were asked to reflect on what their anatomy has meant to their lives and speculate the importance to their donor's life. They were asked to choose five out of 17 body regions (arms, back, brain, ears, eyes, face, feet, gastrointestinal system, gluteals, hands, heart, knees, lungs, mouth, nose, pelvis, and skin) and write a short paragraph about how/why it is important to them. NLP was then used to analyze the 1365 reflections for themes, sentiments and emotions.

Summary Of Results
The most commonly written about body regions were the hands, heart, and brain. Binary sentiment analysis revealed the reflections had an overwhelming positive sentiment with major contributing words "love" and "loved". Predominant words such as "pain" contributed to the negative sentiments and reflected various ailments experienced by students and revealed through dissections of the donors. Lexicon-based emotion classification utilizing the National Research Council Emotion Lexicon (NRC EmoLex) was used to classify writings into eight emotion categories: anger, fear, sadness, disgust, surprise, anticipation, trust, and joy. Analysis revealed the top three emotions to be trust, joy and anticipation. Each body region envoked a unqique combination of emotions. Similarities between student self reflections and donor reflections suggested a shared view of humanization and person-centeredness toward the anatomical donor.

Discussion And Conclusion
In this study, utilization of NLP on student reflections successfully presented information in an easy to- understand manner about the sentiment and emotions experienced while writing about anatomical contributions to self and donors' lives. Take Home Messages While still a maturing tactic, assessing reflections through NLP is a promising method for uncovering themes, the connectedness of student responses, and determining what areas warrant future investigations.

Categories:
Artificial Intelligence in Health Professions Education

Pilot Study: Exploring the Utility of AI for Generating Question Difficulty for USMLE Type Questions

Presenting Author: Jay Forshee - Alice L. Walton School of Medicine
Co-Authors: Gagani Athauda - Alice L. Walton School of Medicine
Yerko Berrocal - University of Illinois College of Medicine Peoria
Lance Bridges - Alice L. Walton School of Medicine
Hector Lopez Cardona - Alice L. Walton School of Medicine
Ian V. J. Murray - Alice L. Walton School of Medicine

Introduction
The evolution of medical education must include Artificial Intelligence (AI). AI is competent in writing multiple choice questions (MCQ) and has passed or nearly passed all three United States Medical Licensing Exams (USMLE). Experts write USMLE-style MCQs to assess knowledge depth and clinical reasoning; however, the a priori assignment of minimal competence value (Angoff score) and difficulty index (DI) are challenging. This pilot study presents a novel application of AI, comparing experts and ChatGPT a priori DI and Angoff rating for an existing bank of existing USMLE MCQs. This has the potential to improve MCQ assessment standardization.

Methods
Sample Step 1 NBME MCQs (n=100) from 2021 and 2022 were used for this pilot study. There were 20 MCQs each from the following disciplines: Biochemistry, Pathology, Anatomy, Pharmacology, and Physiology. The DI and Angoff scores for all sample questions were generated by discipline experts independently, a total of 5 faculty (2 clinicians and 3 basic scientists). Chat GPT, a natural language AI (chat.openai.com, ChatGPT May 24 Version) also generated DI and Angoff scores in triplicate for each MCQ. After optimization on a subset of questions, the prompt used was: "Generate a percentage estimated difficulty index and percentage estimate Angoff score for the following USMLE style question: [insert question].

Results
Data for all disciplines indicated Chat GPT scores were variable, and at times the Angoff score was higher than the DI. The ranges of expert DI and Angoff scores were broad and variable. Optimization of the Chat GPT prompt was needed to overcome constraints such as lack of "access to real-time data," panel of experts, and student performance data. At times, additional prompts were required to overcome these constraints.

Discussion
Overall, this preliminary investigation demonstrated that AI can generate Angoff and DI ratings for USMLE style test questions but results indicate more refinement and training of the AI scoring algorithm is required before it can serve as a standardized, efficient adjunct to expert scoring. Initial analysis showed variable results, with AI-generated scores differing from expert ratings in some cases. Qualitatively, the AI at times assigned higher Angoff ratings than DI percentages, suggesting a disconnect between its assessment of item difficulty versus the proportion of minimally competent students expected to answer correctly. Training of the AI with feedback from content experts may be required to regenerate consistent and accurate outputs. Future directions of this study include training AI to write NBME MCQ using known difficulty and topic.

Categories:
Artificial Intelligence in Health Professions Education

Revolutionizing Medical Education: The Power of Digital Twins, AI, and Digital Pathology

Presenting Author: Ritcha Saxena - University of Minnesota

In the era of AI, the landscape of medical education is undergoing digital transformation, necessitating the adaptation and redefinition of roles for medical educators. Digital Twin (DT) is emerging as a linchpin in healthcare, with applications in studying a patient's genome, physiology, and lifestyle, and contributing to the development of safe and cost-effective interventions. In medical education, DTs serve as virtual replicas. For instance, in pathology education, DT technology can enhance clinical case scenarios by showcasing different pathophysiological mechanisms occurring in various organs. Integration of AI and DT technology in medical education can significantly enhance the learning experience for medical students and better prepare them for real-world scenarios in diagnostics and disease management.

In healthcare, AI has proven capabilities in automating tasks, analyzing large amounts of data, and making predictions based on patterns, revolutionizing patient-centered medicine. Utilizing AI's power in medical education is the obvious next step. AI can generate virtual diagnostic challenges, where students can analyze DT representations of diseases, honing their diagnostic accuracy and critical thinking skills. Also, AI algorithms can suggest differential diagnoses and relevant laboratory investigations in case-based learning (CBL) scenarios based on features seen in DT organ samples. Integrating AI and DT technology in virtual classrooms can also open doors to effortless presentation of complex biomedical data in a comprehensive manner during CBL.

Digital pathology (DP), through high-resolution scanning and image analysis, digitizes pathology slides for interactive learning. When combined with AI and DT technology, DP enhances medical education. For example, rad-path correlation can be made easy by developing AI models to analyze imaging, such as X-rays or MRI scans, and correlate them with morphologies in different disease entities. This not only provides students with a thorough understanding of disease process but also cultivates their acumen for making correlations.

To provide medical students with hands-on experience in AI applications and their potential impact on healthcare, collaborative projects with data scientists and AI experts can be initiated, with an extra benefit in fostering interdisciplinary learning. Medical schools can also support students in utilizing DT by introducing data science and informatics training, and emphasizing evidence-based practice. This ensures a broad grasp of DT's virtual simulation, AI's data analysis and personalized learning, and DP's role in enhanced diagnosis and remote learning.

The convergence of DT, AI, and DP in medical education is revolutionary. Early adoption of these technologies is crucial, as they enhance knowledge retention, critical thinking and diagnostic skills, enabling the physicians of tomorrow to provide exceptional patient care in their future careers.

Categories:
Curriculum: Curricular Transformation and Transition
Artificial Intelligence in Health Professions Education

Virtual Reality and Preclinical Medical Education: A Systematic Review of its Use and Effectiveness

Presenting Author: Serine Torosian - St. George's University, School of Medicine
Co-Authors: Samantha Wehsener - St. George's University, School of Medicine, Grenada
Vanad Mousakhani - Frank H. Netter MD, School of Medicine
Gabrielle Walcott-Bedeau - St. George's University, School of Medicine, Grenada
Vineeta Ramnauth - St. George's University, School of Medicine, Grenada

Background
The use of virtual reality (VR) training in areas with high-stake outcomes, such as the military, aviation, and medicine, prepares individuals for perilous scenarios within a safe and controlled setting. This review article aims to investigate the application and effectiveness of VR technology during preclinical medical education.

Method
A systematic review following the PRISMA guidelines was conducted in May 2023 using the PubMed and Scopus databases and search terms "medical education", "preclinical" and "virtual reality". All relevant studies were screened and collated by two independent reviewers.

Result
The search resulted in 10/25 (40%) articles meeting the criteria for inclusion. There were articles on medical (n=7), dental (n=2), and physician-assistant (n=1) preclinical medical education. A statistically significant improvement in student performance and self-efficacy was shown in 78% (n=7/9) of the studies. There was high student satisfaction reported with the use of VR as a supplemental studying tool (67%, n=2/3). Only one study (33%, n=1/3) showed students were dissatisfied because of the limitations of the technology.

Conclusion
Virtual reality technology promises an improved and immersive experience for learners. Since its first introduction, there has been a growing interest in a positive change in attitude towards the use of VR during education. With continued improvements in technology, it is important to explore the potential for enhancing medical training during the early preclinical years. VR allows students to study anatomical structures that are difficult to visualize on traditional cadavers. It provides animation and visual guides for the easy learning of abstract topics. Additionally, VR simulated learning provides a safe environment, e.g., during Objective Structured Clinical Examinations, allowing students opportunities to practice their clinical reasoning and skills. While VR may not fully replace traditional lectures, it has the potential to surpass the usefulness of textbooks for our future medical learners.

Categories:
Teaching: The Death of Textbooks
Artificial Intelligence in Health Professions Education

Lightning Talk Abstracts: AI in Health Professions Education

Lightning Talk Abstracts:
AI in Health Professions Education