Active reward learning and iterative trajectory improvement from comparative language feedback

Eisuke Hirota, Zhaojing Yang, Ayano Hiranaka, Miru Jun, Jeremy Tien, Stuart J. Russell, Anca Dragan, Erdem Bıyık, Thomas Lord Department of Computer Science, University of Southern California, Los Angeles, School of Computing, State University of New York at Binghamton, Department of Electrical Engineering and Computer Sciences, UC Berkeley, Ming Hsieh Department of Electrical and Computer Engineering, University of Southern California, Los Angeles

The International Journal of Robotics Research

Published online on November 13, 2025

Abstract

The International Journal of Robotics Research, Ahead of Print.
Human-in-the-loop learning has gained traction in fields like robotics and natural language processing in recent years. While prior work mostly relies on human feedback in the form of preference comparisons, this feedback type has multiple limitations. It ...