MetaTOC stay on top of your field, easily

FMimic: Foundation models are fine-grained action learners from human videos

, ,

The International Journal of Robotics Research

Published online on

Abstract

The International Journal of Robotics Research, Ahead of Print.
Visual imitation learning (VIL) provides an efficient and intuitive strategy for robotic systems to acquire novel skills. Recent advancements in foundation models, particularly vision language models (VLMs), have demonstrated remarkable capabilities in ...