Graduate Student Indiana University- Indianapolis Indianapolis, Indiana, United States
Abstract Body : Background: The rapidly evolving technology of AI and its use in the academia is an area of contentious debate. Students enrolled in an undergraduate gross anatomy course at Indiana University-Indianapolis complete written reflections to explore their experiences with donor pathology and humanism as a formal, graded component of the First Patient Project curriculum. However, it is unclear how student and AI platforms responses to the reflection prompts may differ. This study seeks to answer the following research question: How do the linguistic dimensions of responses to First Patient reflection differ between students and generative AI platforms?
Materials & Methods: A subset of reflection prompts was collected from three cohorts of students (n=30). This included an initial reflection from the first day of lab and four-unit reflections where students reflected on their understanding of self and their first patient. Three LLMs (ChatGPT 3.5, Bard, and Claude) were tasked with responding to the same five prompts. Student-generated and LLM reflections were analyzed using Linguistic Inquiry and Word Count (LIWC). Group differences were explored using one-way ANOVA.
Results: Reflections from the student population and three LLMs were analyzed (n= 600) using LIWC. For all linguistic measures, the differences between group means were statistically significant at p< .001, except for "insight" (p = .023). Differences were seen across a range of linguistic dimensions, with the largest denoted in more complex vocabulary and analytic thinking patterns (eta2 0.648 and 0.693). All AI models scored significantly higher than students, with ChatGPT and Bard the most analytic. However, student reflections scored most authentic and least filtered cognitive complexity (p< .001). Claude scored more comparable to students in these areas.
Conclusion: The results illustrate notable differences in linguistic dimensions between student and AI-generated reflections. Students displayed greater cognitive complexity, authenticity, and emotional processing. In contrast, the AI platforms Bard and ChatGPT used more complex vocabulary and analytic language, but showed limitations in capturing the nuanced, personal reactions evident in the students' writing. Claude showed the highest potential for comparison to the student-generated reflections.
Significance/Implication: LLM performance indicates the technology is expanding, reflecting highly intelligent writing with insight and emotion. Educators should consider the opportunities and potential of LLMs, particularly in reflective exercises.