Research
The foundation’s new agent learns to operate different robotic arms, solves tasks from just 100 demos, and improves from self-generated data.
Robots are quickly becoming a part of our everyday lives, but they are often only programmed to perform specific tasks correctly. While harnessing recent advances in artificial intelligence could lead to robots that could help in many more ways, progress in building general-purpose robots is slower in part because of the time it takes to collect real-world training data.
Our last paper introduces a self-improving artificial intelligence agent for robotics, RoboCat, which learns to perform a variety of tasks on different arms and then generates new training data to improve its technique.
Previous research has explored how to develop robots that can learn to multitask at scale and combine understanding of language models with real-world capabilities of a helper robot. RoboCat is the first agent that solves and adapts to multiple tasks and does it on different, real robots.
RoboCat learns much faster than other state-of-the-art models. It can take on a new job with as little as 100 demos because it draws from a large and varied data set. This capability will help accelerate robotics research by reducing the need for human-supervised training and is an important step toward creating a general-purpose robot.
How RoboCat is improving
RoboCat is based on our multimodal model Gato (Spanish for “cat”), which can process language, images and actions in both simulated and real environments. We combined Gato’s architecture with a large training dataset of sequences of images and actions of various robot arms solving hundreds of different tasks.
After this first training cycle, we started RoboCat on a “self-improvement” training cycle with a set of tasks it had not seen before. Learning each new task followed five steps:
- Collect 100-1000 demos of a new task or robot, using a robotic arm controlled by a human.
- Optimize RoboCat in this new task/arm by creating a specialized spin-off agent.
- The spin-off agent practices this new task/weapon an average of 10,000 times, generating more training data.
- Integrate demo data and self-generated data into your existing RoboCat training dataset.
- Train a new version of RoboCat on the new training dataset.
RoboCat’s training cycle is enhanced by its ability to autonomously generate additional training data.
Combining all this training means the latest RoboCat is based on a dataset of millions of trajectories, from both real and simulated robotic arms, including self-generated data. We used four different types of robots and multiple robotic arms to collect vision-based data representing the tasks RoboCat would be trained to perform.
RoboCat learns from a wide range of training data types and tasks: videos of a real robotic arm picking up gears, a simulated arm stacking blocks, and RoboCat using a robotic arm to pick up a cucumber.
Learning to operate new robotic arms and solving more complex tasks
With RoboCat’s varied training, it learned to operate different robotic arms within hours. While trained on arms with two-finger grips, he was able to adapt to a more complex arm with a three-finger grip and twice as many controlled inputs.
Left: A new robotic arm RoboCat has learned to control
Correctly: Video of RoboCat using the arm to collect gears
After observing 1000 human-controlled demonstrations collected in just a few hours, RoboCat could steer this new arm skillfully enough to successfully pick up speeds 86% of the time. With the same level of demonstration, it could be adapted to solving tasks that combined accuracy and comprehension, such as removing the correct fruit from a bowl and solving a shape-matching puzzle, which are necessary for more complex control.
Example tasks RoboCat can adapt to solving after 500-1000 demos.
The self-improving general
RoboCat has a virtuous cycle of training: the more new tasks it learns, the better it learns additional new tasks. The original version of RoboCat was successful just 36% of the time on previously unseen tasks after learning from 500 demonstrations per task. But the latest RoboCat, which had been trained on a wider variety of tasks, more than doubled that success rate on the same tasks.
The large difference in performance between the original RoboCat (one round of training) compared to the final version (extensive and diverse training, including self-improvement) after both versions were fine-tuned to 500 previously unseen task demonstrations.
These improvements were due to RoboCat’s increasing range of experience, similar to how humans develop a more diverse range of skills as they deepen their learning in a given domain. RoboCat’s ability to independently learn skills and rapidly self-improve, especially when applied to different robotic devices, will help pave the way for a new generation of more useful general-purpose robotic agents.