Presented By: Peter Boedeker, Baylor College of Medicine
Purpose
Differential Item Functioning (DIF) occurs when two learners of the same ability have different probabilities of answering an item correctly. If unchecked, such biased items can perpetuate inequity. DIF detection methods include logistic regression, Mantel-Haenszel (MH), and IRT-based Wald testing. The purpose of this statistical simulation is to evaluate these methods under brief assessment conditions.
Methods
Data were simulated based on a Rasch model. Varied conditions were number of test takers (100/200), number of items (10/20), proportion of DIF items (0.05/0.1/0.2), DIF severity (item difficulty difference of 0.2/0.5), proportion of test takers for whom DIF existed (0.2/0.4), and whether correction for multiple comparisons (CMC) was performed (yes/no). Outcomes included (1) proportion of total items incorrectly flagged for DIF and (2) proportion of DIF items correctly identified. Factorial ANOVAs and post hoc tests were used to evaluate performance.
Results
Regarding incorrectly flagging items for DIF, method of DIF detection explained the greatest variability (eta-squared=0.29, p<0.001), followed by crossed factors of method and CMC use (eta-squared=0.10, p<0.001) and main effect of CMC (eta-squared=0.17, p<0.001). MH with CMC resulted in the lowest proportion of incorrectly identified DIF items and Logistic regression without CMC the highest. Regarding proportion of DIF items correctly identified, the main effect of CMC use explained the greatest proportion of variability (eta-squared=0.05, p<0.001) followed by method used to identify DIF (eta-squared=0.03, p<0.001) and degree of DIF (eta-squared=0.02, p<0.001). MH with CMC yielded the smallest average proportion of correctly identified items and Logistic regression without CMC the highest.
Conclusions
Regardless of the existence of DIF, MH with CMC flagged the smallest proportion of items whereas Logistic regression without CMC identified the highest proportion. If over-identification is preferable to under-identification to ensure fair assessments, Logistic regression without CMC is recommended. Future work evaluating additional methods could be beneficial.