Humanoid robots offer two unparalleled advantages in general-purpose embodied intelligence. First, humanoids are built as generalist robots that can potentially do all the tasks humans can do in complex environments. Second, the embodiment alignment between humans and humanoids allows for the seamless integration of human cognitive skills with versatile humanoid capabilities. To build generalist humanoids, there are three critical aspects of intelligence: (1) Semantic intelligence (how the robot understands the world and reasons); (2) Physical/Motion intelligence (locomotion and manipulation skills); and (3) Mechanical/Hardware intelligence (how the robot actuates and senses). In this talk, I will present some recent works (H2O, OmniH2O, WoCoCo, ABS) that aim to unify semantic and physical intelligence for humanoid robots. In particular, H2O and OmniH2O provide a universal and dexterous interface that enables diverse human control (e.g., VR, RGB) and autonomy (e.g., using imitation learning or VLMs) methods for humanoids, WoCoCo provides an efficient framework for loco-manipulation skill learning without motion priors, and ABS provides safety guarantees for agile vision-based locomotion control. Finally, I will briefly discuss how to combine learning-based control approaches and traditional model-based control approaches to get the best of two worlds.
- Tags
-