Oral Abstracts: Assessment

Looking for a different abstract category? Click the links below!

Please note that abstracts are listed alphabetically. To view the oral presentation schedule, click here.

Gilberto Garcia
Texas Tech University Health Sciences Center Paul L. Foster School of Medicine

Background
Medical schools face the challenge of a rapidly increasing Hispanic patient population. The need for Spanish language physician-patient concordance extends beyond that which can be met with Spanish-speaking physicians. To help address this healthcare need, the Paul L. Foster School of Medicine requires all students to take medical Spanish. Two years of Spanish instruction are longitudinally integrated in alignment with the pre-clerkship phase medical skills course. For medical Spanish instruction to improve patient outcomes, it should be linked to an assessment strategy focused on improving language concordance with Spanish-speaking patients, and it should include safety measures to prevent inadvertent communication errors.

Methods
Students were placed in one of three instructional tiers: Beginner, Intermediate, or Advanced. An internally developed rubric was used to assess oral Spanish competency in doctor-patient communication. This consisted of sections on basic conversation, relevant questioning in the context of a patient encounter, followed by a student's elaboration of a summary including diagnoses and a plan. A guide with example answers is also provided for the evaluator.

Results
Students uniformly reported that the pre-clerkship Spanish curriculum prepares them for communicating with Spanish-speaking patients. As expected, we observed that advanced students showed the highest scores. Advanced students stated that the linkage of the Spanish course to the medical skills course enhances their history taking ability.

Conclusion
Oral evaluations of medical Spanish competency should include assessments based on patient encounters. This uniquely systematic and required component of the medical curriculum may serve as proof of concept for other schools developing Spanish language initiatives.

Laura Bauler
Western Michigan University Homer Styrker M.D. School of Medicine

Purpose
Receipt and delivery of feedback is a major component of medical education. Individualized, constructive feedback takes time, effort, and skill to deliver but also requires the recipient to be willing to receive that feedback. Feedback is a major component of clerkship assessment and has long been a challenge in medical education. Few studies have explored what makes feedback meaningful for students, thus this study aims to identify the characteristics of useful feedback from the student perspective.

Methods
A qualitative thematic analysis was completed on survey feedback collected from medical students following completion of their core clerkships. Data was collected from 152 students from the 2021 and 2022 graduating classes at Western Michigan University Homer Stryker MD School of Medicine. Students were asked to describe the experiences that helped students give and receive feedback, and the most meaningful feedback they received. Using an inductive approach, free text responses were coded independently by two authors, themes were identified by all authors in an iterative process, and similar codes were grouped.

Results
Three themes were identified from the experiences that helped students give and receive feedback including: 1) the importance of practicing these skills in low stakes peer-based environments, 2) repetition of feedback, and 3) scheduled formal one-on-one feedback with preceptors during clerkships. Four themes emerged from the thematic analysis of the most meaningful feedback received including: 1)the importance of specific, constructive, actionable feedback, 2) the student-perceived authenticity of the feedback, 3) the format and delivery of the feedback, and 4) the specific feedback of "be confident and speak up".

Conclusion
Feedback is an essential part of growing and learning, but delivery/receipt of meaningful feedback is challenging. Careful design of the ways in which feedback is delivered to students can improve their perception of how significantly that feedback promotes their career growth.

Doreen M. Olvet
Donald and Barbara Zucker SOM at Hofstra/Northwell

Purpose
Recent publications describe the value and feasibility of open-ended questions (OEQs) to assess medical knowledge. Incorporation of OEQs into NBME exams additionally incentivizes schools to prepare their students for this assessment format. However, it is unknown how many US medical schools include OEQs in their assessment toolkit. The objective of this study is to determine the prevalence and describe the use of OEQs in medical student knowledge assessment in the US.

Methods
An online survey was sent out to all 156 accredited US medical schools. Questions addressed the use of OEQs to assess medical knowledge in the pre-clerkship and clerkship years. Descriptive statistics were calculated.

Results
Currently, 53 medical schools have completed the survey (34% response rate). Thirty-four schools (64%) reported using OEQs for medical knowledge assessment during the pre-clerkship. Twenty of the 34 (59%) use OEQs for formative and 28 (82%) for summative assessment. The type of format included short answer (85%), essay (62%) and fill in the blank/phrases (24%). A majority of schools (88%) reported that OEQs make up less than 50% of pre-clerkship assessment of medical knowledge and the remaining schools utilize OEQs for most or all of the assessment (70-100%). Only 11 schools (21%) use OEQs for medical knowledge assessment in the clerkships. 100% of the 11 schools using OEQs in clerkships for summative assessment and only 5 (45%) for formative assessment. Clerkship OEQs were either in essay (64%) or short answer format (64%). On average, schools have used OEQs for 9.5 years (range: 1-21 years).

Conclusions
More than half of the respondents use OEQs, mostly in the pre-clerkship setting. Among these schools, OEQs make up less than half of the assessment of medical knowledge. The low survey response rate may result in an overestimation of OEQ use; however, data collection is ongoing. Data on why schools utilize OEQs is also being collected.

Rakesh Calton
Ross University School of Medicine

Purpose
Interexaminer variability in an Objective Structured Clinical Examination (OSCE) is well described 1, 2. We, therefore, undertook this study to probe inter-examiner variability in two cohorts of the 2nd year summative OSCE, at Ross University School of Medicine (RUSM), Barbados.

Methods
The aim of this study was to review the OSCE process and conduct a statistical analysis of the outcome and performance data to determine inter-examiner variability. Various factors responsible for this variability are determined, analyzed, and discussed. The Statistical analysis used a MANOVA model that determines whether there are statistically significant differences among levels of independent variables on multiple dependent variables3. Wilks’ lambda (?) and Tukey’s multicomparison were used as tests of significance.

Results
There was a significant examiner effect for each of the two cohorts of the OSCE. For the May 2019 cohort (13 examiners), calculated Wilks’ lambda (?) = .00154457 (F = 3.12, P < .001 for alpha = 0.5), while for the September 2019 cohort (20 examiners), calculated Wilks’ lambda (?) = 0.0006194 (F = 2.83, P < .001 for alpha = 0.5). Cronbach’s alpha was calculated as a measure of internal consistency and scale reliability4. For the May cohort, examiners’ consistency ranged from ‘good’ (0.8 – 0.9,7.6%) to ‘unacceptable’ (<0.5, 38.46%) while for the September cohort examiners’ consistency ranged from ‘good’ (0.8 – 0.9, 20%) to ‘unacceptable’ (<0.5, 30%). None of the examiners in both cohorts had an ‘excellent’ Cronbach’s alpha (>0.9).

Conclusion
There is significant inter-examiner variability observed in the 2nd year OSCE cohorts at RUSM. Multiple factors contributing to this variability have been analyzed, discussed, and will be presented at the conference. The study underlines the importance of identifying various factors contributing to inter-examiner variability, which in turn serves to strengthen examiner training and standardization.

Sabha Ganai
University of North Dakota School of Medicine and Health Sciences

Purpose
Oral examinations facilitate assessment of decision-making and are used in high-stakes summative assessments including the American Board of Surgery Certifying Examination.  While oral examinations have been provided to novices as part of our surgical clerkship, our medical students described limited performance feedback and a lack of understanding of the meaning of their grade. We updated our grading rubric with a goal to improve the quality of feedback given to the student in a formative fashion.

Methods
Performance sites were part of a surgical clerkship located at a medical school spanning a rural region. Oral exam scores during the 2019-2023 academic years were summarized across four campuses (n=247 observations). Historical grading rubrics were scored from 0 to 10. The proposed replacement rubric was an anchored 5-point Likert scale with 10 questions per case across 4 thematic domains. Average scores for the new rubric were compared against the paired historical rubric in 18 subjects examined by 5 raters. Data are reported as medians with interquartile ranges (IQR).

Results
Our historical grading rubric demonstrated differences in scoring across campuses (p=0.02) and between certain campuses (p&lt;0.05). Linear regression analysis demonstrated a correlation between average scores from new and old rubrics (p=0.008) while controlling for the rater (p=0.90). While responses in the new rubric were anchored to specific descriptions, the questions can be summarized over domains including "information gathering" (4, IQR 3-5), "understanding information" (4, IQR 3-5), "decision-making" (4; IQR, 3-5), and "communication skills" (4, IQR 3-4).

Conclusions
The historical grading rubric demonstrated significant and problematic variance across campuses. An iterative process was used to develop a replacement examination rubric that improves student feedback, minimizes inter-campus variability and facilitates rater feedback. Further longitudinal data will be required to assess intercampus variance in performance and assess value to the novice learner.

Priyadarshini Dattathreya
Ross University School of Medicine

Purpose
Learning needs assessments are an integral part of continuous quality improvement. Our goal was to conduct a competency-based learning needs assessment of our matriculating medical students using the Association of American Medical Colleges (AAMC) competencies for entering medical students as the framework. This abstract describes how the use of framework analysis method met our goal.

Methods
We used exploratory sequential mixed methods design to conduct the needs assessment. We surveyed faculty, tutors and first year medical students to collect qualitative data on learning needs for our matriculating medical students. We analyzed the data using the five-step process of framework analysis. The steps included familiarizing with the data, deductively categorizing data using AAMC competencies as the thematic framework, coding data snippets under each category, charting the codes on a framework matrix and interpreting the mapped data. We reviewed the mapped data independently and with experts to create a 29-item quantitative survey, each item outlining a specific need. We administered the survey to all pre-clinical students and asked them to rate their competence and importance score for each item using a 5-point Likert scale. We used the scores to calculate the Mean Weighted Discrepancy Score (MWDS) which indicated the size of learning gaps.

Results
70 participants completed the qualitative survey. 248 pre-clinical students completed the quantitative survey (20.9% response rate). Cronbach’s alpha coefficient was 0.92. 19 out of the 29 items had a MWDS of >5.00.

Conclusions
The framework analysis was the ideal qualitative data analysis method for our needs assessment. It supported the exploration of learning needs from multiple perspectives and aligned them with the AAMC competencies. The output generated a quantitative survey which was critical to quantify learning gaps. Therefore, framework analysis supported a competency-based learning needs assessment of matriculating medical students and informed our continuous quality improvement processes.

Maria Patricia Ascano
Touro University Nevada

Purpose
Remediation is the process by which students who failed a course must perform successfully on a subject-specific examination, demonstrating a qualifying level of competency. Remediation methodologies differ among medical institutions. The prevalent approach is the self-study method, whereby a failing student is given a specific period to relearn the material before being re-examined to earn a passing grade. Another approach is where faculty provide a structured framework that may include small group active learning strategies, individualized feedback, and reflection. The study's goal was to find if faculty-led remediation is more effective than self-study for a medical student's success during the didactic years.

Methods
A quantitative approach was utilized to evaluate the outcomes of different remediation strategies within the first and second didactic years of medical school education. Articles were hand searched using keywords in library databases such as PubMed, Dissertations and Theses Global (ProQuest), ERIC, PsycInfo (EBSCO), and Google Scholar. This study focused on two remediation strategies, namely, faculty-led remediation and independent self-study methods. The exclusion criteria for the study included remediation during clinical rotations, other graduate education remedial courses, and tutoring.

Results
The preliminary results indicate that remediation methods where faculty intervention was present led to a greater number of passing students along with higher passing scores upon reassessment. It was further observed that students with a greater support system from faculty, as well as group-based activities, performed better than students who utilized an independent study approach.

Conclusions
As the preliminary results indicated, faulty interventions lead to greater success than independent self-study focused remediation. By the end of this review, more detailed remediation practices will be outlined. The goal is to select the best remediation strategy that leads to better academic success in the didactic years of medical school.

Emily Moorefield
University of North Carolina Chapel Hill School of Medicine

Purpose
Mastery of preclinical medical science content is critical for future student success. In a pass/fail curriculum students achieving high scores on assessments early in a course accumulate points and therefore need low scores on final exams to pass. This may result in diminished motivation to learn material leading up to the final and may ultimately cause gaps in medical knowledge. We set a standard passing score on cumulative final exams with the goal of requiring content mastery throughout the entire course. Students not meeting the standard score reviewed material and retook the exam to demonstrate understanding.

Methods
We implemented a standard passing score of 70 for cumulative final exams in each medical science system-based course. Final exams were created using NBME Customized Assessment Services (CAS) to select questions tailored to course instruction. Passing required a score of ? 70 in the overall course in addition a score of ? 70 on the final exam. Students achieving a passing score in the overall course but scoring below the passing standard on the final exam were required to retake only the final exam several days later.

Results
Analysis of student performance on final exams in the first system-based courses in our preclinical curriculum revealed that with the passing standard set there was an increase in the average score on the final exam and there were fewer students below the 70 threshold than in prior years. The few students not meeting the passing standard retook the final exam and passed on the first retake attempt.

Conclusions
Setting a standard passing score on cumulative final exams promotes effective study habits throughout the duration of the course and prevents students from disregarding content delivered late in the course. The standard score also allowed identification of students in need of additional academic support so that we could provide resources to improve success in future courses.

Brock Mutcheson
Virginia Tech Carilion School of Medicine

Purpose
Assessment performance scores are essential to both low and high-stakes medical education decision-making at several organizational levels. Modern validity frameworks provide foundational tools for systematically integrating quantitative and qualitative evidence to support the intended uses of scores. We present a comprehensive validity argument for the intended organizational uses of Phase-1 Clinical Science Domain scores and demonstrate the utility of modern validity frameworks.

Methods
First, we present a literature review of historical and modern approaches to measurement validity and describe how they relate to reliability theory and critical efforts towards assessment equity. Next, we describe Virginia Tech Carilion School of Medicine (VTCSOM) phase-1 clinical science program and assessment instruments, linking current learning objectives to construct maps, items, scores, and measurement models. We use three years of multi-dimensional assessments to demonstrate how we integrate data and related assumptions to support inferences in the progression from a single observation to a final decision. Data sources include a rigorously analyzed multiple-choice exam, assessments of interview skills, physical skills, written presentation skills, clinical reasoning, and communication/interpersonal skills.

Results
The result of this analysis is the integration of qualitative and quantitative data into a comprehensive validity argument for clinical science comprehensive domain scores for their intended uses. We organize evidence following Kane's inferences of scoring, generalization, extrapolation, and implications.

Conclusion
Validity is fundamentally about integrating information to create scores that align with their intended interpretations and uses. One advantage of Kane's Framework is that it provides an infrastructure to apply to individual quantitative assessment instruments, qualitative assessment tools, and assessment programs.

Ariel J VanLeuven
AU/UGA Medical Partnership

Purpose
Laboratory examinations are a routine part of gross anatomy education in UME settings, but there are few detailed reports of the reliability of these assessments and item difficulty by modality (e.g., cadaver, bone, and anatomical imaging), particularly on two-stage collaborative laboratory examinations. This project describes statistical measures of reliability and analyses of performance on questions of varying content sources on gross anatomy laboratory examinations.

Methods
First-year students in AY 2021-22 (N = 61) took six gross anatomy laboratory examinations throughout the academic year using ExamSoft. Each laboratory examination had an individual component followed by a team-based component. A Kuder-Richardson 20 (KR-20) test of internal consistency was performed on each assessment and calculations of mean and standard deviation were performed on all questions for each examination.

Results
The KR-20 for individual examinations ranged from 0.79–0.86, while the KR-20 for team examinations ranged from -0.07–0.71. The average increase in the mean score between individual and team examinations on first-order cadaver-based identification questions was 16.7 points, and was 23.6 points on second-order cadaver-based questions. The average increase in the mean score between individual and team examinations on first-order osteological identification questions was 15.7 points, and was 22.9 points for second-order osteological questions. The average increase in the mean score between individual and team examinations on imaging-based first-order identification questions was 15.6 points.

Conclusions
This study indicates that the internal reliability of our individual laboratory assessments was acceptable, less so for team assessments. Additionally, team laboratory assessments improved mean scores on second-order questions moreso than first-order identification questions. These findings may help anatomy educators construct assessments with higher reproducibility and guide choices about question selection.

Abner J. Colón Ortiz
Ponce Health Sciences University

Purpose
This research was to use Machine Learning and applied statistics to classify and predict the results of medical students in the USMLE STEP 1 through the scores in the Basic Sciences courses in the first and second year.

Methods
A quantitative methodology based on the use of machine learning was implemented with the learner's neuronal networks, kNN, Support Vector Machine (SVM), Linear Regression, and Random Forest. This research worked with 245 students who took the USMLE STEP 1 in the period from April 2021 to October 2022. These results were matched with the final grades in the first and second-year basic science courses. Then, Machine Learning procedures and applied statistics were used to classify and predict the results using the open-source Orange Data Mining.

Results
The results of the prediction through a multiple linear regression using the learner Ridge were obtained from the scores of Medical Biochemistry II (r= 0.98) Pathology II (r= 0.88), Pathology I (r= 0.84), which predicts high scores in the STEP 1 USMLE. Regarding the classification, which found that students who obtain scores greater in Pathology I and II, and in Medical Biochemistry II, are the ones with the highest score in the USMLE STEP 1 (M=255). The prediction model through the Pathology II course was the most accurate in determining the students who did not pass the USMLE STEP 1 with 85%.

Conclusion
The classification and prediction results showed that the courses with the greatest influence on the USMLE STEP 1 score are Pathology I, Pathology II, and Medical Biochemistry II.

Cathryn Caudill
Kentucky College of Osteopathic Medicine / University of Pikeville

Purpose
Based on our experience with faculty-authored formative assessments that improved student performance on pathology course examination items, we began recommending and/or assigning board-style practice questions from the commercial test bank, TrueLearn©. We explored student preferences for and perceptions about these two types of assessments for developing content understanding and mastery. We also examined the influences of these assessments on student communication with faculty and examination preparation.

Methods
We administered a voluntary and anonymous SurveyMonkey© survey to students enrolled in our pathology course. Survey questions asked whether students preferred faculty-authored and/or TrueLearn© assessments for various types of learning approaches and skills related to course examination and board preparation, and how they engaged with the assessments and course faculty. A subset of survey questions explored student perceptions about TrueLearn© assessments and their influence on motivation and study habits. We also examined TrueLearn© assessment data and reviewed faculty records of student office visits and email correspondence.

Results
The student response rate to our survey was 77%. Students believed both faculty-authored and TrueLearn© assessments were valuable for supporting various skills in concept development and mastery. Students were more likely to communicate with faculty about TrueLearn© assessments, and reported a slight preference in their use for board preparation, although they were less likely to utilize these questions when recommended as an ungraded activity versus graded assignment. Students reported increased workload and stress with graded TrueLearn© assignments, but also reported increased confidence, earlier engagement, and review of material for board preparation.

Conclusion
Students appreciate a mix of formative assessment strategies. Adding TrueLearn© to the pathology curriculum is a novel approach that engages students with board-style questions early in their learning process, and results in increased student engagement with faculty for question-answering strategies.

Rachel Linger
Rocky Vista University

Purpose
Systemic racism and bias in medical education reinforce misperceptions and stereotypes about marginalized populations. Students translate this way of thinking into behavior patterns, which adversely impacts patient care and contributes to health disparities. In order to mitigate these effects, medical educators must first identify where and how diversity, equity, and inclusion (DEI) measures are present or absent in medical curricula. A DEI-conscious course evaluation was developed as a strategy to diagnose strengths and weaknesses related to expressions of DEI in the course. An additional goal was to provide students the opportunity to provide their open and honest feedback in a safe, confidential manner. &nbsp;

Methods
In 2021, two closed questions (Likert scale) and two open questions (long-answer) were included in an end-of-course evaluation. Course faculty and students were notified and provided DEI resources prior to the publication of the survey. 148 (51% response rate) second-year medical students voluntarily completed the survey. Qualitative content analysis was performed to identify students' experiences of diversity, equity, and inclusivity within the course.

Results
Students appreciated the opportunity to provide feedback about DEI. Most comments were positive and/or constructive. Several instructors were praised for their kindness and empathy. Specific examples were given regarding the necessary context for racial, ethnic, or other socioeconomic disparities. Survey data was used to conduct a targeted review of instructional materials, resulting in the mitigation of racializing or stigmatizing content from one lecture slide and seven exam items prior to the 2022 iteration of the course.

Conclusions
Feedback gathering through course evaluations is a principal strategy educators employ to assess strengths and weaknesses in various areas. DEI-conscious course evaluations can serve as a tool to identify and mitigate implicit bias and stereotypes in educational content.