Abner Colón - Ponce Health Sciences University
Ariel VanLeuven - Augusta University/University of Georgia Medical Partnership
Sabha Ganai - University of North Dakota School of Medicine and Health Sciences
Moderated by Amanda Chase
Session Coordinator: Tammy Harris
Presentation 1 - Using Course Evaluations as a Tool to Mitigate Implicit Bias and Stereotypes in Educational Content
Rachel Linger
Rocky Vista University
Purpose
Systemic racism and bias in medical education reinforce misperceptions and stereotypes about marginalized populations. Students translate this way of thinking into behavior patterns, which adversely impacts patient care and contributes to health disparities. In order to mitigate these effects, medical educators must first identify where and how diversity, equity, and inclusion (DEI) measures are present or absent in medical curricula. A DEI-conscious course evaluation was developed as a strategy to diagnose strengths and weaknesses related to expressions of DEI in the course. An additional goal was to provide students the opportunity to provide their open and honest feedback in a safe, confidential manner.
Methods
In 2021, two closed questions (Likert scale) and two open questions (long-answer) were included in an end-of-course evaluation. Course faculty and students were notified and provided DEI resources prior to the publication of the survey. 148 (51% response rate) second-year medical students voluntarily completed the survey. Qualitative content analysis was performed to identify students' experiences of diversity, equity, and inclusivity within the course.
Results
Students appreciated the opportunity to provide feedback about DEI. Most comments were positive and/or constructive. Several instructors were praised for their kindness and empathy. Specific examples were given regarding the necessary context for racial, ethnic, or other socioeconomic disparities. Survey data was used to conduct a targeted review of instructional materials, resulting in the mitigation of racializing or stigmatizing content from one lecture slide and seven exam items prior to the 2022 iteration of the course.
Conclusions
Feedback gathering through course evaluations is a principal strategy educators employ to assess strengths and weaknesses in various areas. DEI-conscious course evaluations can serve as a tool to identify and mitigate implicit bias and stereotypes in educational content.
Presentation 2 - The Use of Machine Learning and Applied Statistics to Predict and Classify Medical Students' USMLE STEP 1 Score
Abner J. Colón Ortiz
Ponce Health Sciences University
Purpose
This research was to use Machine Learning and applied statistics to classify and predict the results of medical students in the USMLE STEP 1 through the scores in the Basic Sciences courses in the first and second year.
Methods
A quantitative methodology based on the use of machine learning was implemented with the learner's neuronal networks, kNN, Support Vector Machine (SVM), Linear Regression, and Random Forest. This research worked with 245 students who took the USMLE STEP 1 in the period from April 2021 to October 2022. These results were matched with the final grades in the first and second-year basic science courses. Then, Machine Learning procedures and applied statistics were used to classify and predict the results using the open-source Orange Data Mining.
Results
The results of the prediction through a multiple linear regression using the learner Ridge were obtained from the scores of Medical Biochemistry II (r= 0.98) Pathology II (r= 0.88), Pathology I (r= 0.84), which predicts high scores in the STEP 1 USMLE. Regarding the classification, which found that students who obtain scores greater in Pathology I and II, and in Medical Biochemistry II, are the ones with the highest score in the USMLE STEP 1 (M=255). The prediction model through the Pathology II course was the most accurate in determining the students who did not pass the USMLE STEP 1 with 85%.
Conclusion
The classification and prediction results showed that the courses with the greatest influence on the USMLE STEP 1 score are Pathology I, Pathology II, and Medical Biochemistry II.
Presentation 3 - The Bare Bones of Gross Anatomy Laboratory Examinations: Analyses of Exam Reliability and Item Difficulty on Individual and Team Assessments
ArielVanLeuven
AU/UGA Medical Partnership
Purpose
Laboratory examinations are a routine part of gross anatomy education in UME settings, but there are few detailed reports of the reliability of these assessments and item difficulty by modality (e.g., cadaver, bone, and anatomical imaging), particularly on two-stage collaborative laboratory examinations. This project describes statistical measures of reliability and analyses of performance on questions of varying content sources on gross anatomy laboratory examinations.
Methods
First-year students in AY 2021-22 (N = 61) took six gross anatomy laboratory examinations throughout the academic year using ExamSoft. Each laboratory examination had an individual component followed by a team-based component. A Kuder-Richardson 20 (KR-20) test of internal consistency was performed on each assessment and calculations of mean and standard deviation were performed on all questions for each examination.
Results
The KR-20 for individual examinations ranged from 0.79–0.86, while the KR-20 for team examinations ranged from -0.07–0.71. The average increase in the mean score between individual and team examinations on first-order cadaver-based identification questions was 16.7 points, and was 23.6 points on second-order cadaver-based questions. The average increase in the mean score between individual and team examinations on first-order osteological identification questions was 15.7 points, and was 22.9 points for second-order osteological questions. The average increase in the mean score between individual and team examinations on imaging-based first-order identification questions was 15.6 points.
Conclusions
This study indicates that the internal reliability of our individual laboratory assessments was acceptable, less so for team assessments. Additionally, team laboratory assessments improved mean scores on second-order questions moreso than first-order identification questions. These findings may help anatomy educators construct assessments with higher reproducibility and guide choices about question selection.
Presentation 4 - Iterative Development of an Oral Examination Grading Rubric as a Strategy for Improving Formative Feedback in a Surgical Clerkship
Sabha Ganai
University of North Dakota School of Medicine and Health Sciences
Purpose
Oral examinations facilitate assessment of decision-making and are used in high-stakes summative assessments including the American Board of Surgery Certifying Examination. While oral examinations have been provided to novices as part of our surgical clerkship, our medical students described limited performance feedback and a lack of understanding of the meaning of their grade. We updated our grading rubric with a goal to improve the quality of feedback given to the student in a formative fashion.
Methods
Performance sites were part of a surgical clerkship located at a medical school spanning a rural region. Oral exam scores during the 2019-2023 academic years were summarized across four campuses (n=247 observations). Historical grading rubrics were scored from 0 to 10. The proposed replacement rubric was an anchored 5-point Likert scale with 10 questions per case across 4 thematic domains. Average scores for the new rubric were compared against the paired historical rubric in 18 subjects examined by 5 raters. Data are reported as medians with interquartile ranges (IQR).
Results
Our historical grading rubric demonstrated differences in scoring across campuses (p=0.02) and between certain campuses (p<0.05). Linear regression analysis demonstrated a correlation between average scores from new and old rubrics (p=0.008) while controlling for the rater (p=0.90). While responses in the new rubric were anchored to specific descriptions, the questions can be summarized over domains including "information gathering" (4, IQR 3-5), "understanding information" (4, IQR 3-5), "decision-making" (4; IQR, 3-5), and "communication skills" (4, IQR 3-4).
Conclusions
The historical grading rubric demonstrated significant and problematic variance across campuses. An iterative process was used to develop a replacement examination rubric that improves student feedback, minimizes inter-campus variability and facilitates rater feedback. Further longitudinal data will be required to assess intercampus variance in performance and assess value to the novice learner.