Session Details: IAMSE 2026 Annual Conference

Number

107

Name

Integrating Psychometrics Into Rater Training: Evidence From A Simulated Patient

Date & Time

Monday, June 8, 2026, 6:00 PM - 7:30 PM

Location Name

Oglethorpe Ballroom

Speakers

Wanda Jirau-Rosaly - Medical College of Georgia

Authors

Ahmet Guven, Augusta University Wanda Jirau-Rosaly, Augusta University A.J. Kleinheksel, Augusta University

Presentation Topic(s)

Assessment

Description

PURPOSE
In clinical skills exams, faculty scoring variability can undermine
fairness and accuracy. Traditional rater-training methods improve comfort but
not rating consistency. To address this, we used the Many-Facet Rasch Model
(MFRM) to separate student ability, task difficulty, and rater severity, and
to generate individualized feedback. Unlike prior uses of MFRM, our approach
integrates these findings directly into faculty development to improve
scoring reliability in a clinical assessment.
METHODS
Multiple faculty raters scored different students using an EPA-based
rubric, with one common student scored by all raters to support linkage. MFRM
was applied to estimate rater severity, identify under-/over-scoring
patterns, evaluate item functioning, and assess the stability of scoring
across raters. Each faculty member received an individualized feedback
profile summarizing their severity estimates, fit statistics, and item- and
student-level performance.
RESULTS
Training-phase analyses showed notable variability in rater consistency.
Several raters with misfit during training improved in the test phase,
suggesting that feedback was effective for some individuals. For example,
Rater 2 showed substantial misfit in training but demonstrated excellent fit
in the test, and Raters 14 and 18 similarly normalized after feedback.
However, not all raters benefited: Rater 22 showed persistent misfit, while
Raters 20 and 37 developed misfit in the test, indicating rater drift.
Overall, feedback improved consistency for some raters but was not uniformly
effective.
CONCLUSIONS
Overall, although feedback based on training data improved scoring
consistency for several raters, its impact was not uniform, underscoring the
importance of individualized, data-driven rater development. By integrating
MFRM findings directly into rater training, our approach provided faculty
with concrete, personalized, evidence-based insight into their scoring
behavior. Using simulated patients, this method offers a practical and
scalable strategy for strengthening rater calibration, improving fairness,
and enhancing the validity of clinical assessments.