On the Consistency of Automatic Scoring with Large Language Models
Educational and Psychological Measurement
Published online on February 16, 2026
Abstract
Educational and Psychological Measurement, Ahead of Print.
Large language models (LLMs) have shown great potential in automatic scoring. However, due to model characteristics and variation in training materials and pipelines, scoring inconsistency can exist within an LLM and across LLMs when rating the same ...
Large language models (LLMs) have shown great potential in automatic scoring. However, due to model characteristics and variation in training materials and pipelines, scoring inconsistency can exist within an LLM and across LLMs when rating the same ...