LOS ALTOS, CA—Engineers at the Toyota Research Institute (TRI) here are using generative AI technology to quickly teach robots new, dexterous skills. It is a step toward building large behavior models for robots, analogous to the large language models that have recently revolutionized conversational AI.
“Our research in robotics is aimed at amplifying people rather than replacing them,” says Gill Pratt, Ph.D., CEO of TRI and chief scientist at Toyota Motor Corp. “This new teaching technique is both very efficient and produces very high performing behaviors, enabling robots to much more effectively amplify people in many ways.”
Previous state-of-the-art techniques to teach robots new behaviors were slow, inconsistent, inefficient and often limited to narrowly defined tasks performed in highly constrained environments. Roboticists needed to spend many hours writing sophisticated code or using numerous trial and error cycles to program behaviors.
TRI has already taught robots more than 60 difficult, dexterous skills using the new approach, including pouring liquids, using tools and manipulating deformable objects. These achievements were realized without writing a single line of new code. The only change was supplying the robot with new data.
According to Pratt, robots can be taught to function in new scenarios and perform a wide range of behaviors. These skills are not limited to just “‘pick and place” or simply picking up objects and putting them down in new locations. “[Our machines] can now interact with the world in varied and rich ways, which will one day allow robots to support people in everyday situations and unpredictable, ever-changing environments,” explains Pratt.
“The tasks that I’m watching these robots perform are simply amazing,” adds Russ Tedrake, Ph.D., vice president of robotics research at TRI. “Even one year ago, I would not have predicted that we were close to this level of diverse dexterity.
“What is so exciting about this new approach is the rate and reliability with which we can add new skills,” notes Tedrake. “Because these skills work directly from camera images and tactile sensing, using only learned representations, they are able to perform well even on tasks that involve deformable objects, cloth and liquids—all of which have traditionally been extremely difficult for robots.”