FMimic: Foundation models are fine-grained action learners from human videos

Beijing, P. R. , Kong, P. R. , Nanjing, P.R.

The International Journal of Robotics Research

Published online on October 17, 2025

Abstract

The International Journal of Robotics Research, Ahead of Print.
Visual imitation learning (VIL) provides an efficient and intuitive strategy for robotic systems to acquire novel skills. Recent advancements in foundation models, particularly vision language models (VLMs), have demonstrated remarkable capabilities in ...