MetaTOC stay on top of your field, easily

On the Consistency of Automatic Scoring with Large Language Models

, , ,

Educational and Psychological Measurement

Published online on

Abstract

Educational and Psychological Measurement, Ahead of Print.
Large language models (LLMs) have shown great potential in automatic scoring. However, due to model characteristics and variation in training materials and pipelines, scoring inconsistency can exist within an LLM and across LLMs when rating the same ...