Session Details: IAMSE 2026 Annual Conference

Number

502

Name

How Much Clerkship Performance Scores Reflect True Student Ability? A Multilevel Variance Decomposition of a 22-Item Assessment Form

Date & Time

Sunday, June 7, 2026, 5:30 PM - 7:00 PM

Location Name

Oglethorpe Ballroom

Speakers

Ashley Saucier - Medical College of Georgia

Authors

Ahmet Guven, Augusta University Ashley Saucier, Augusta University Andria Thomas, Augusta University

Presentation Topic(s)

Other

Description

PURPOSE
The Medical College of Georgia (MCG), has a centrally monitored assessment
system that utilizes psychometric analysis as a source of quality
improvement. Clinical clerkships rely on multisource performance assessments
to generate final rotation grades, yet these scores may be influenced by
factors beyond student competence (e.g., rater severity/leniency and
contextual effects). This study examined sources of variability in an
institutional 22-item clinical performance assessment.
METHODS
Data included 1,128 final performance ratings from 218 students, evaluated
by 58 raters across three clerkships and two regional campuses. Raters with
fewer than five observations were excluded to ensure stable variance
estimates. A linear mixed-effects model partitioned score variability into
four components: student (true performance), rater (severity/leniency),
clerkship context, and residual unexplained error. Rater severity estimates
and confidence intervals were visualized using caterpillar plots to identify
patterns of stringency and leniency.
RESULTS
Rater severity accounted for the largest proportion of total variance
(49.9%). Residual unexplained error contributed 37.9% while clerkship context
explained 8.5% of score variation. Only 3.8% of the total variance was
attributable to student-level differences in performance. Severity estimates
varied widely across raters, with several showing extreme stringency or
leniency. Campus differences were not statistically significant.
CONCLUSIONS
Most of the variability in final clerkship performance scores stemmed from
rater behavior and measurement noise. These findings highlight the need for
continued faculty development and psychometric approaches (e.g., Many-Facet
Rasch Model) to support fair and defensible high-stakes decisions. By systematically
examining these patterns, the institution demonstrates a commitment to
improving the validity and fairness of clinical performance assessment.

Presentation Tag(s)

Best Faculty Poster Nominee