Presented By: Jonathan Bowden, University of Cincinnati College of Medicine
Co-Authors: Megha Mohanakrishnan, University of Cincinnati College of Medicine
Andrew Thompson, University of Cincinnati College of Medicine
Purpose
Although qualitative analysis is a powerful tool in medical education research, it can be time consuming when large datasets are used. While recent advancements in artificial intelligence (AI) have the potential to aid researchers in analyzing large datasets, it is unclear how results generated from this technology differ compared to traditional, manually (human) generated data. The purpose of this study is to provide insight into this question by comparing thematic analysis results between the authors and a commonly used AI platform, ChatGPT.
Methods
This study utilized two years of data that stemmed from an open-ended question asking first year medical students to reflect on their feelings related to participating in dissection of the human body. Thematic analysis of the data was performed by first submitting a random sample of 30 responses to ChatGPT to generate an initial list of themes. Authors JB and MM then conducted an interobserver error study that included an iterative process where this initial list of themes was refined. Following satisfactory inter-observer error results, the authors each coded half of the entire dataset (N=343 responses). ChatGPT was then provided a list of the updated themes with descriptions and asked to code the entire dataset. Accuracy of the AI-generated results was explored by comparing the outcomes against those coded manually by the authors.
Results
Accuracy was first investigated using the criteria of correctly assigning a code or correctly not assigning a code to each response. In this case, there was on average 83% agreement between ChatGPT and the authors. When considering accuracy in terms of agreement when a code as assigned, concordance dropped to an average of 44.5%.
Conclusion
AI-driven technology provides an opportunity to reduce the workload involved in qualitative analysis but lacks the nuanced interpretation of data required for the coding phase of thematic analysis.