An Artificial Intelligence‐Enhanced Assessment Framework for Analyzing Middle School Science Students’ Written Responses

Namsoo Shin, Xunlei Qian, Cory Susanne Miller, Hang Li, Yucheng Chu, Joseph Krajcik, Jiliang Tang, Yue Xing

Published online on April 30, 2026

Abstract

["Journal of Educational Measurement, Volume 63, Issue 2, Summer 2026. ", "\nAbstract\nThis study develops and tests a Large Language Model‐based assessment framework that uses a multi‐agent system to analyze students’ written responses, generate scoring rationales, identify uncertainty levels, and assign final scores to support learning. The framework was tested using chemistry responses from 834 middle school students scored with a dichotomous analytic rubric. Through prompt engineering and aggregation methods, the multi‐agent system—enhanced by rubric revisions and human scoring insights informed by Artificial Intelligence (AI)‐generated rationales—achieved 94% scoring accuracy, representing a 16% improvement over the 78% accuracy of a single‐agent model using the original rubric. The system's uncertainty detection closely aligned with areas where human raters also indicated uncertainty. Results indicate a strong relationship between AI confidence and scoring accuracy: when the AI was confident, its scores were largely correct, and low‐confidence cases were often inaccurate. These findings demonstrate the value of a multi‐agent system with human‐AI collaboration, in which the AI identifies unclear cases and teachers review and refine uncertain scores. This collaboration approach shows the framework's potential to enhance classroom assessment by providing timely, reliable scores and feedback to inform teaching and learning. The future work may improve accuracy further through retrieval‐augmented generation, human‐in‐the‐loop, and fine‐tuning with synthetic data.\n"]