Automating creativity assessment in engineering design: A psychometric validation of AI‐generated items of the design problem task

Simone A. Luchini, Roger E. Beaty, Ayesha S. Boyce, Sarah E. Zappe, Boris Forthmann

Published online on May 03, 2026

Abstract

["Journal of Engineering Education, Volume 115, Issue 3, July 2026. ", "\nAbstract\n\nBackground\nCreativity is essential for engineering design, yet its assessment remains challenging due to the resource‐intensive nature of traditional evaluation methods.\n\n\nPurpose/Hypothesis(es)\nThis study investigates the potential of automatic item generation (AIG) using large language models (LLMs) to create psychometrically sound items for the design problem task (DPT), which measures creative thinking in engineering.\n\n\nDesign/Method\nWe developed and validated engineering design problems across three domains: ability difference and limitations (e.g., assisting people with learning impairments), transportation and mobility (e.g., reducing traffic congestion in mega cities), and social environments and systems (e.g., improving access to clean water in remote areas). The study comprised three phases with samples matched on race and ethnicity: (1) content validation with a diverse sample of 40 engineers evaluating item clarity and validity; (2) item administration to 462 engineering students; and (3) response evaluation by 65 expert raters assessing originality and effectiveness.\n\n\nResults\nResults demonstrated that LLM‐generated items achieved comparable or higher content validity rates than expert‐written items (43% vs. 20% success). Bayesian confirmatory factor analysis supported a unidimensional model for fluency, originality, and effectiveness scores, with excellent reliability estimates (.92–.95). While fluency showed minimal correlation with originality (r = −.11) and effectiveness (r = −.04), originality and effectiveness were strongly positively correlated (r = .73).\n\n\nConclusions\nThe present research advances our understanding of automated assessment generation in engineering education, provides empirical evidence for the psychometric properties of AI‐generated engineering creativity tasks, and offers a scalable approach for measuring creative thinking in engineering classrooms.\n\n"]