Foundation models from machine learning have enabled rapid advances in perception, planning, and natural language understanding for robots. However, current systems lack any rigorous assurances when required to generalize to novel scenarios. For example, perception systems can fail to identify or localize unfamiliar objects, and large language model (LLM)-based planners can hallucinate outputs that lead to unsafe outcomes when executed by robots. How can we rigorously quantify the uncertainty of machine learning components such that robots know when they don’t know and can act accordingly?
In this talk, I will present our group’s work on developing principled theoretical and algorithmic techniques for providing formal assurances on learning-enabled robots that act based on rich sensory inputs (e.g., vision) and natural language instructions. The key technical insight is to leverage and extend powerful methods from conformal prediction and generalization theory for rigorous uncertainty quantification in a way that complements and scales with the growing capabilities of foundation models. I will present experimental validation of our methods for providing strong statistical guarantees on LLM planners that ask for help when they are uncertain, and for vision-based navigation and manipulation.