Name
The Use of Artificial Intelligence for Scoring a 12-Station OSCE Using Standardized Patients: Validity, Reliability, and Dependability
Description

Presented By: Claudio Violato, University of Minnesota
Co-Authors: Esther Dale, University of Minnesota
Jackie Gauer, University of Minnesota

Purpose 
While objective structured clinic exam (OSCE) research has explored reliability, validity and dependability have not been thoroughly studied.  The main purpose of the present study, therefore, was to further study the validity, reliability, dependability, and use of AI for assessing clinical competence of a multi-station OSCE using standardized patients for this performance-based assessment. 

Methods 
A total of 240 third year medical students (138 women - 58.0%) were assessed in a 12 station OSCE using a carefully constructed table of specifications, based on a thorough literature review and a modified Delphi procedure of 32 experts (PhD and MD faculty).  The following categories were assessed: overall history taking, physical exam skills, overall communications skills, and clinical reasoning skills.  Students wrote post-encounter SOAP (subjective, objective, assessment and plan) notes for each clinical encounter.  We employed AI technology to score the SOAP notes based both on a rubric and refining it based on natural language processing (NLP) algorithms.  Passing this OSCE is a requirement for graduation. 

Results 
The internal consistency reliability for each station (Cronbach ?) ranged from 0.66 - 0.83.  The generalizability analysis indicated that 12 station encounters produce a G-coefficient (Ep2) > 0.70, a value acceptable for high-stakes assessments with direct observations.  The content validity index (CVI) = 0.86 and content validity ratio (CVR) = 0.74.  Factor analysis resulted in four factors: (1) Hx Taking & Counseling, (2) Physical Exam, (3) Communication/ Professionalism, and (4) Clinical Reasoning.  The AI scored SOAP notes resulted in reliable and content valid results. 

Conclusions 
Both Cronbach's alpha and Ep2 results provide substantial evidence of reliability and score dependability.  The CVI, CVR, and AI scoring provide substantial evidence of content validity.  The results of the factor analyses provide evidence of construct validity.  Best practices OSCEs can have substantial evidence of content and construct validity, reliability and dependability.

Date & Time
Tuesday, June 18, 2024, 10:30 AM - 10:45 AM
Location Name
Marquette IX