Oral Abstracts: Assessment

Looking for a different abstract category? Click the links below!

Please note that abstracts are listed alphabetically. To view the oral presentation schedule, click here.

Presented By: Ethan Snow, South Dakota State University
Co-Authors: Libby Gregg, South Dakota State University
Greg Heiberger, South Dakota State University
Jordan Neises, South Dakota State University

Purpose 
Students' confidence in their knowledge and certainty in their performances are two elements of metacognition that influence the efficacy of teaching and learning. Human anatomy taught via cadaver prosections (CAD) or virtual reality (VR) exemplifies visuospatial learning, but limited information is available about the effects these pedagogies have on student metacognition. The objective of this study is to compare the efficacy of two-point student metacognition (confidence and certainty) from VR and CAD learning experiences. 

Methods 
Twenty-five students were guided through comparable VR then CAD learning experiences (VR-CAD) while another 24 students were guided through the same experiences in reverse order (CAD-VR). The same learning assessment was administered after each learning experience, during which students were prompted to report their level of pre-assessment confidence about the learning objectives and post-response certainty for each assessment item using a 4-level Likert scale (4 = high, 3 = moderately high, 2 = moderately low, and 1 = low). 

Results 
Compared to a collective 1.47 (low) baseline mean confidence level in the learning objectives, mean student confidence after initial VR and CAD learning experiences increased to 3.09 and 3.17 (moderately high) and their corresponding mean certainty levels to the same learning assessment questions were 2.90 and 3.27 (moderately high), respectively. After encountering both experiences, VR-CAD students' mean confidence and certainty increased to 3.50 and 3.52 (high), and CAD-VR students' mean confidence and certainty increased to 3.31 (moderately high) and 3.51 (high), respectively. Select confidence and certainty breakdowns per assessment item and per student will also be presented. 

Conclusions 
CAD-VR students demonstrated higher-level and better-aligned two-point metacognition after the initial CAD experience, while VR-CAD students demonstrated higher-level and better-aligned two-point metacognition after encountering both learning experiences. This study may help educators integrate VR into anatomy curricula, evaluate pedagogical efficacy, and improve metacognition-based teaching and learning. 

Best Faculty Oral Presentation Nominee

Presented By: Dawn Morton-Rias, National Commission on Certification of Physician Assistants
Co-Authors: Mirela Bruza-Augatis, National Commission on Certification of Physician Assistants
Andrew Dallas, National Commission on Certification of Physician Assistants
Joshua Goodman, National Commission on Certification of Physician Assistants
Andrzej Kozikowski, National Commission on Certification of Physician Assistants

Background
COVID-19 significantly impacted physician assistant/associate (PA) education programs. Most programs transitioned didactic and clinical education from in-person to remote, and clinical training opportunities diminished. Graduates of accredited PA programs take the Physician Assistant National Certifying Examination (PANCE), a five-hour exam with 300 multiple-choice questions, and must attain or exceed the scaled passing score of 350 (range: 200-800). We examined whether the pandemic impacted first-time examinees' PANCE scores and passing rates.

Methods
We analyzed data (N=59,459) from the National Commission on Certification of PAs. The two primary outcomes were PANCE scores and pass rates. The main exposure was the timeframe, categorized as pooled three years pre-pandemic (2017-2019) and three years during the pandemic (2020-2022). The 2017-2018 scores were equated to the new passing standard implemented in 2019. Covariates included age, gender, years the PA program has been accredited, program region, and rural-urban setting. Analyses consisted of descriptive, bivariate, and multivariate statistics.

Results
The mean PANCE score and pass rate during the six-year study period were 463 and 93%, respectively. When comparing the pooled three-year pre-pandemic mean PANCE score (462), it was lower than the first pandemic year (473; p<0.001), not significantly different from the second (465; p=0.051) and higher than the third (453; p<0.001). These results remained when adjusting for test-taker and PA program covariates. Similarly, pass rates decreased from 92.3% during the pre-pandemic period to 91.7% (p<0.001) in 2022. When controlling for covariates, examinees had 24% higher odds of failing in 2022 compared to the pre-pandemic period.

Conclusion
Findings suggest that PANCE scores and pass rates were impacted during the third year of the pandemic. PANCE assesses if examinees have essential clinical knowledge to enter the PA profession. It is crucial to determine whether the pandemic affected PANCE scores and pass rates to ensure PAs provide safe and high-quality patient care.

Presented By: Sally Binks, University of Toronto
Co-Authors: Ryan Brydges, University of Toronto
Jaimie Coleman, University of Toronto
Vyshnave Jeyabalan, University of Toronto
Kulamakan Kulasegaram, University of Toronto
Nicole Woods, University of Toronto

Purpose
Multiple-choice questions (MCQs) should be optimized to prepare learners for future learning. One way is if MCQs promote learners' engagement in distinctive processing, whereby learners recruit prior conceptual knowledge to differentiate confusable categories. This is an example of a beneficial response process - an essential component of validity for formative tests. We report response process validity evidence that purposefully designed MCQs can elicit distinctive processing. 

Methods 
We conducted a comparative think-aloud study with nursing and medical students. Participants were assigned to a 'competitive' or non-competitive version of a 19-question A-type MCQ test. Items assessed basic science relevant to clinical skills in critical care medicine. All items were developed using the key feature identification approach. The distinctive processing aligned 'competitive' version of the test had 3 response options that were developed from the key features. The non-competitive MCQ version had distractors that were not key features. Think-alouds were recorded and then coded by two raters to identify the response processes used by participants to select answers for each version of the test. We compared frequency of response processes between the two test groups. 

Results 
Sixteen participants completed the study (8 in each group). We identified 10 categories of response processes: Four categories entailed distinctive processing of the response options; 4 categories entailed non -specific reasoning processes; the other 2 categories included guessing and retrieval of prior knowledge without reviewing response options. Participants who took the competitive version of the test used distinctive processing significantly more often than those who took the non-competitve version [?2 (df=1,) = 17.9, n=192, p <0.001]. 

Conclusion 
MCQ tests designed to have competitive response options induce cognitive processing that may optimally prepare learners for future learning. This response process validity is the first step in designing formative testing to promote clinical reasoning.

Student Presentation

Presented By: Petra van Gurp, Radboudumc
Co-Authors: Roy Claessen, Radboudumc
Annelies van Ede, Radboudumc

Purpose
Early patient contacts (EPC) have a significant impact on the professional development of medical students. However, evaluating professional development is sophisticated. Literature about assessment methods to drive professional development in a longitudinal EPC course is scarce. This study aims to explore how written reports and narrative group assessment contribute to professional development in the context of longitudinal EPC for undergraduates.

Methods
In this qualitative study, 12 students, assessed by written reports, 31 students, who performed narrative group assessment and 3 assessors familiar with both assessment methods, participated. With explorative focus group interviews (students and assessors separated) we gained insights into assessment experiences, awareness of and growth in professional development. With inductive analysis we collected common themes.

Results
In 8 semi-structured focus group interviews 43 students and 3 assessors participated. Writing a report did not stimulate further professional development according to students. The learning takes place within the experience of EPC and soon after it. The narrative group assessment provided a safe environment to deepen both individual and group members' learning outcomes. Peer assisted learning, mainly through questioning and feedback, encouraged the learning process, helped colleagues to reflect and think of next steps to learn from future EPC. Both students and assessors underlined the importance of peer assisted learning and group reflection in stimulating professional development.

Conclusions
Assessment by written reports did not add value. Students' learning from EPC stopped after the experience and reflective observation phases of Kolb's cycle, although in literature reflective essays show merit and reveal information about progress in professionalism. Peer assisted learning within narrative group assessment, encouraged conceptualization and ideas for experimentation, being the next steps in Kolb's learning cycle for personal and professional development based on EPC experiences. This study illustrates the power of assessment for learning.

International Presenter

Presented By: Diorra Shelton, Touro University Nevada
Co-Authors: Eseosa Aigbe, Touro University Nevada
Brian Kaw, Touro University Nevada
Amina Sadik, Touro University Nevada
Naweed Yusufzai, Touro University Nevada

Purpose
Artificial intelligence (AI) Large Language Models, such as ChatGPT-4,  are novel tools transforming medical educational. This study pioneers ChatGPT-4 usage for medical science students' self-assessment and its goal is to adopt a more efficient and effective method of item writing while ensuring adequate coverage of basic science concepts and encouraging faculty to use AI. 

Methods
Students in the medical biochemistry course are required to write MCQs based on learning objectives to create a question bank for self-assessment. Four students were selected to do the same using ChatGPT-4 to compare the effectiveness, quality, and time spent producing AI vs. manually-created MCQs. All questions underwent rigorous faculty evaluation to ensure basic science concept coverage. Final versions of AI-generated questions were imbedded in formative and summative assessments using ExamSoft. An evaluation was conducted using a survey to determine students' perception of AI-generated questions used in assessment.

Results
Thus far, these preliminary findings reveal there is undeniable effectiveness in utilizing ChatGPT-4, but user prompts are critical. At an extremely lower rate than free models, ChatGPT-4's responses contain inaccuracies, necessitating faculty evaluation and participants having a deep grasp of the content to identify hallucinations. The point biserial of AI-generated questions in assessments averaged 0.42. The partial analysis of the survey revealed that 50% of students were comfortable using ChatGPT and 58% correctly identified AI questions. More than 93% agreed that AI questions enhanced their understanding of basic science concepts, provided a fair assessment of their knowledge, were clear and would like to see more of these questions on summative exams.

Conclusion
Given effectiveness, efficiency and performance on AI-generated questions we suggest students and faculty use ChatGPT-4 for practice questions and summative exams. This work is ongoing in formulating more effective prompts, leading to more accurate ChatGPT-4 generated MCQs and possibly circumventing the need for faculty evaluation.

Student Presentation, Best Student Oral Presentation Nominee

Presented By: Jennifer Eastwood, Rush University
Co-Authors: Melanie Farlie, Monash University
Michelle Lazarus, Monash University
Georgina Stephens, Monash University
Adam Wilson, Rush University

Purpose
Uncertainty tolerance (UT) describes how individuals perceive and respond to uncertainty. Studies purporting to measure UT with quantitative scales have associated doctors' responses with outcomes related to patient care and well-being. Such scales are used for programmatic measure of medical schools, however, there appear to be important questions about the validity of these scales implemented among students. This study explores response process validity of two commonly implemented UT scales, Tolerance for Ambiguity (TFA) and Physician's Reaction to Uncertainty (PRU), with the aim of understanding whether students' conceptualizations of items align with those intended by scale developers.

Methods
Cognitive interviewing captured the thought processes of Australian (9) and US (22) medical and health professions students. Participants thought-aloud as they responded to TFA or PRU items. Probing questions clarified students' understanding of items, response challenges, and item relatability. Interviews were recorded, transcribed, and analyzed using framework analysis.

Results
Participants generally reported understanding scale items, despite identifying several issues related to clarity and specificity. Participants struggled to contextualize items situated in clinical practice contexts due to lack of relevant experience. Participants often referenced academic experiences or imagined clinical scenarios to guide their responses. Participants also indicated that their limited responsibility as students influenced how they responded to uncertainty in relation to items.

Conclusions
Aspects of the student role pose unique challenges to the content validity of the TFA and PRU scales. These scales may not be measuring the UT construct in students in the same way they are thought to measure the construct in practicing doctors. Thus, we caution against using the TFA and PRU scales amongst medical students. By identifying salient factors in students' conceptions and responses to uncertainty, this work also helps to reframe UT measurement, and potentially advance theoretical models of UT in medical and health professions students.

Presented By: Johanna Clewing, Texas A&M University School of Medicine
Co-Authors: Marietta Clewing, Texas A&M University School of Medicine

Purpose 
Even though the AAMC implemented standards for the Medical Student Performance Evaluations (MSPE) in 2016, there are still significant differences among schools, making the selection process for program directors challenging, and ultimately impacting the UME to GME transition. We are presenting our process and outcome in standardizing our own clerkship performance evaluations across clerkships and multiple campuses aiming for more consistency.   

Methods
In a collaborative process, Academic Affairs and Executive Clerkship Directors worked on this project. Clerkship directors discussed and rated what to share in the MSPE and in which format. The group reviewed a multitude of MSPE's from schools throughout the US for different specialties. In multiple working and revising sessions the clerkship directors developed different MOCK clerkship performance evaluations until the group was confident with the final draft product. During the entire process, the group presented their work to different stakeholders, including Student Affairs and Curriculum Committee for feedback.  

Results 
There was strong consensus among clerkship directors to be more transparent about NBME results and to aim for consistency in narrative-writing. The final MOCK clerkship performance evaluation clearly distinguishes the objective assessment data using non-numerical grades for the total course and course components, providing brief context related to clinical setting, timing and test requirements from the narrative itself, a well-structured summary for each core competency, incorporating quotes from teams the student worked with. We conclude with a comment from the clerkship director projecting the student's performance moving forward. 

Conclusion 
Clerkship directors are confident that the new MSPE format helps streamlining narrative-writing across the board, especially when faculty turn-over can be a challenge, as more clarity is provided. We are confident that the new MSPE format will be well-received by Residency Programs.

Presented By: Ryan Tubbs, Michigan State University College of Human Medicine
Co-Authors: Nathan Hankerson, Michigan State University College of Human Medicine
Quynh Tran, Michigan State University College of Human Medicine

Purpose
Reach Out To Youth (ROTY) is an outreach program staffed by medical student volunteers in which children ages 7 to 11 and their parents attend a Saturday program of interactive workshops specifically designed to motivate underrepresented youth into careers in the medical profession. This project focuses on parental engagement, motivation, and satisfaction at the Michigan State University College of Human Medicine (MSU CHM) site to assess the efficacy of ROTY as an outreach program.

Methods
Parents and students engage in separate workshops that mirror each other to support longitudinal learning and provide a foundation to foster familial discussion after the program. Importantly, the parent curriculum is conducted to give parents the tools necessary to support their children's interests in healthcare and educate them about topics related to their child's development. Parents were asked to complete a questionnaire after the workshops which included questions about their motivations for participating, overall satisfaction, and likelihood of returning to ROTY. Research was conducted by MSU CHM medical students with faculty guidance.

Results
Questionnaire results indicate the majority of parents are satisfied with ROTY and found it to be a positive learning experience for their children and them. Parents report that a career in a medical profession seems more feasible for their children after involvement in this program. Parents also indicated that they left with more knowledge about the topics presented than they did before. Lastly, parents were more inclined to participate in events like ROTY in the future.

Conclusion
Outreach programs targeted towards underrepresented youth that spark interest in the medical field that also involve parents can serve as an effective early pathway-program into healthcare based on parental satisfaction, engagement, and motivating factors. ROTY is a rewarding experience for the children, parents, and medical student volunteers involved.

Presented By: Munder Zagaar, Baylor College of Medicine
Co-Authors: Peter Boedeker, Baylor College of Medicine
Sandra Haudek, Baylor College of Medicine

Purpose
To optimize learning for pre-clinical medical students, our institution introduced a two-part summative assessment strategy. The present study details student exam performance, perceptions of the assessment strategy, and study approaches between each exam.  

Methods
In the 4-week Foundations of Medicine course, 223 MS1 students engaged in weekly two-part assessments, featuring morning (AM) and afternoon (PM) exams with 35-40 unique multiple-choice questions tagged to objectives. Individualized feedback on missed objectives post-AM exam facilitated remediation during a 4-hour gap before the PM exam.  A post-course survey, given to all MS1 students, collected data on study strategies between exams and the perceived impact of two-part assessments on learning. 

Results 
Students exhibited significant improvement (p < 0.001) in the first three Friday assessments, with average score increases of 4.93, 5.06, and 10.86 from AM to PM sessions. However, there was no significant change in the fourth and final Friday assessment. 99 students (44.8%) responded to the survey. Regarding study time utilization between exams, discussion with peers was most common, followed by independent study and group study sessions. Concerning assessment format preferences, 27% favored a more targeted PM assessment over taking another comprehensive assessment, while only 4% preferred having high-stakes summative exams. Additionally, 82% agreed they were more confident in the PM exam, and respondents indicated that the assessment format allowed them to prioritize mastering content (90%), enhanced comprehension (74%), and pinpointed areas for improvement or strengths (78%).  

Conclusions
The two-part assessment strategy was favored over other formats, with learners utilizing the 4-hour break for a combination of collaborative learning and independent study. This approach facilitated focused learning while reducing exam-related stress. The results endorse the ongoing use of this strategy locally, and other institutions are encouraged to explore its implementation for their learners.

Presented By: Abner Colón Ortiz, Ponce Health Sciences University

Purpose
This research applied a binary logistic regression and decision tree method to classify and predict the results (Pass or Fail) of medical students in the USMLE STEP 1 through the 2-digit results of the Comprehensive Basic Science Examination (CBSE) developed by the National Board of Medical Examiners (NBME).

Method
A quantitative methodology based on the use of applied statistics was implemented with the binary logistic regression and decision tree in a prediction correlation design. This research worked with 172 students who took the USMLE STEP 1 with the new results (Pass or Fail) in the period from April 2022 to October 2023. A predictor model was applied using binary logistic regression to examine the predictive effect of CBSE on USMLE STEP 1 results. Additionally, the decision tree method was used to analyze how CBSE results classify Pass or Fail in USMLE STEP 1. To implement these techniques and methods of applied statistics, the new version 29.0 of IBM SPSS Statistics was used.

Results
The results of the classification and prediction using the decision tree method and a predictor model using a binary logistic regression, reflected that students of the educational institution with a score of 53 or more in the CBSE have a probability of passing (Pass) USMLE STEP 1 of 96.4%. These results are comparable to the 2022 CBSE score of 51 or above that was established to predict students' success on USMLE STEP 1. With this result of 51, a predictive model was obtained in 90% of results passed in USMLE STEP 1.

Conclusion
These results demonstrate how the CBSE outcomes predict the approval of USMLE STEP 1. These findings will be used to compare the predictive accuracy of the binary logistic regression model and the classification of the decision tree with the results obtained by the students in 2024.

Presented By: Kyle Bauckman, Nova Southeastern University
Co-Authors: Alec Reeber, Nova Southeastern University

Purpose
Professionalism is a core competency expected of medical students. Lack of professionalism during medical education heralds future challenges as a practicing physician. Previous findings emphasize the importance of professionalism, but little is understood of the nuanced mechanisms to assess it. Medical education utilizes various measures to assess professionalism. We sought to optimize our current institutional assessment strategy for professionalism by investigating faculty and student perceptions of the process. Our previous investigations found improvement in the assessment process with these guidelines, but the changes received mixed reception amongst students. We aimed to further investigate understanding of professionalism and optimize messaging of these expectations for the learner.

Methods
This study was approved through IRB review; protocol #2022-268. Participation in the study was solicited through email to all active students at our institution. Participation was voluntary. Survey responses were aggregated and analyzed.

Results
Enrolled students at our institution were requested to participate (n=152) with an 18.4% response rate (n=28). Most students believed they were in the top 25% or higher in professionalism (68%). Most students disagreed or strongly disagreed that our current professionalism system was effective (71.4%). The majority were in favor of ranked based assessment of professionalism used for Medical Student Performance Evaluation (MSPE) letters (60.7%). Assessment of professionalism observations was ranked similar to previous faculty findings.

Conclusions
There is a gap between student and faculty perceptions of professionalism. Faculty witness a snapshot of a student's behavior and early incidences may unintentionally influence future observations. Our findings suggest students' perceptions should be considered when developing an assessment process for professionalism. Interestingly, a ranked-based assessment model for professionalism had favorability amongst surveyed students. This merits future development of an assessment model that highlights exceptional demonstration of professionalism rather than targeting only professional deficits.

Presented By: Claudio Violato, University of Minnesota
Co-Authors: Esther Dale, University of Minnesota
Jackie Gauer, University of Minnesota

Purpose 
While objective structured clinic exam (OSCE) research has explored reliability, validity and dependability have not been thoroughly studied.  The main purpose of the present study, therefore, was to further study the validity, reliability, dependability, and use of AI for assessing clinical competence of a multi-station OSCE using standardized patients for this performance-based assessment. 

Methods 
A total of 240 third year medical students (138 women - 58.0%) were assessed in a 12 station OSCE using a carefully constructed table of specifications, based on a thorough literature review and a modified Delphi procedure of 32 experts (PhD and MD faculty).  The following categories were assessed: overall history taking, physical exam skills, overall communications skills, and clinical reasoning skills.  Students wrote post-encounter SOAP (subjective, objective, assessment and plan) notes for each clinical encounter.  We employed AI technology to score the SOAP notes based both on a rubric and refining it based on natural language processing (NLP) algorithms.  Passing this OSCE is a requirement for graduation. 

Results 
The internal consistency reliability for each station (Cronbach ?) ranged from 0.66 - 0.83.  The generalizability analysis indicated that 12 station encounters produce a G-coefficient (Ep2) > 0.70, a value acceptable for high-stakes assessments with direct observations.  The content validity index (CVI) = 0.86 and content validity ratio (CVR) = 0.74.  Factor analysis resulted in four factors: (1) Hx Taking & Counseling, (2) Physical Exam, (3) Communication/ Professionalism, and (4) Clinical Reasoning.  The AI scored SOAP notes resulted in reliable and content valid results. 

Conclusions 
Both Cronbach's alpha and Ep2 results provide substantial evidence of reliability and score dependability.  The CVI, CVR, and AI scoring provide substantial evidence of content validity.  The results of the factor analyses provide evidence of construct validity.  Best practices OSCEs can have substantial evidence of content and construct validity, reliability and dependability.