Research
Using human and animal movements to teach robots to dribble a ball and simulating humanoid characters to carry boxes and play soccer
Humanoid character who learns to traverse an obstacle course through trial and error, which can lead to idiosyncratic solutions. Hess, et al. “Emergence of Movement Behaviors in Rich Environments” (2017).
Five years ago, we took on the challenge of teaching a fully articulated humanoid character cross obstacle courses. This demonstrated what reinforcement learning (RL) can achieve through trial and error, but also highlighted two challenges in solving embodied intelligence:
- Reusing previously learned behaviors: A significant amount of data was needed to “unstick” the agent. Without any initial knowledge of what force to apply to each joint, the agent began by randomly twitching the body and rapidly falling to the ground. This problem could be mitigated by reusing previously learned behaviors.
- Temperamental behaviors: When the agent finally learned to navigate obstacle courses, it did so with unnatural (although entertaining) motion patterns that would be impractical for applications such as robotics.
Here, we describe a solution to both challenges called neural probabilistic motor primitives (NPMP), involving supervised learning with human and animal-derived movement patterns, and discuss how this approach is used in Soccer Paper Humanoid, published today in Science Robotics.
We also discuss how this same approach enables full-body humanoid manipulation by vision, such as a humanoid carrying an object, and real-world robotic control, such as a robot dribbling a ball.
Data distillation in controllable primitive motors using NPMP
The NPMP is a general-purpose motor control module that translates short-horizon motor intentions into low-level control signals and is trained offline the through RL by simulating motion capture (MoCap) data, recorded with trackers on humans or animals performing movements of interest.
An agent learning to mimic a MoCap trajectory (shown in gray).
The model has two parts:
- An encoder that takes a future trajectory and compresses it into a driving intent.
- A low-level controller that produces the next action given the agent’s current state and this engine intent.
Our NPMP model first distills benchmark data into a low-level controller (left). This low-level controller can then be used as a plug-and-play motor control unit in a new task (right).
After training, the low-level controller can be reused to learn new tasks, where a high-level controller is optimized to directly output the motor’s intentions. This allows efficient exploration – since coherent behaviors are produced, even with random sampling of motor intentions – and limits the final solution.
Emergency team coordination in humanoid soccer
Football was a long-standing challenge for embedded intelligence research, which requires individual skills and coordinated team play. In our last work, we used an NPMP as a prior to guide movement skill learning.
The result was a group of players who progressed from learning ball chasing skills to finally learning coordination. Formerly, in one study with simple integrations, we had shown that coordinated behavior can occur in competing groups. NPMP allowed us to observe a similar effect but in a scenario that required significantly more advanced motor control.
The agents first imitate the movement of soccer players to learn an NPMP module (top). Using NPMP, agents then learn soccer-specific skills (below).
Our agents have acquired skills such as agile movement, transition and division of labor, as evidenced by a range of statistics, including metrics used in real world sports analytics. Players demonstrate both flexible high-frequency motor control and long-term decision-making that involves predicting the behavior of teammates, leading to coordinated team play.
An agent that learns to play soccer competitively using multi-agent RL.
Whole-body manipulation and cognitive tasks using vision
Learning to interact with objects using the arms is another difficult control challenge. NPMP can also enable this type of whole-body manipulation. With a small amount of box-interaction MoCap data, we are able to do this train an agent to carry a box from one location to another using egocentric vision and with only a sparse reward signal:
With a small amount of MoCap data (top), our NPMP approach can solve a box transport task (bottom).
Similarly, we can teach the agent to catch and throw balls:
Simulating a humanoid catching and throwing a ball.
Using NPMP, we can also address maze tasks involving movement, perception and memory:
Humanoid simulation collecting blue balls in a maze.
Safe and efficient real-world robot control
NPMP can also help control real robots. Proper posture is critical for activities such as walking on rough terrain or handling fragile objects. Nervous movements can damage the robot itself or its surroundings, or at least drain its battery. As such, considerable effort is often invested in designing learning objectives that make a robot do what we want while behaving in a safe and efficient manner.
Alternatively, we investigated whether we used antecedents derived from biological motion can give us well-tuned, natural-looking and reusable motor skills for legged robots such as walking, running and turning that are suitable for development in real-world robots.
Starting with MoCap data from humans and dogs, we adapted the NPMP approach to train skills and controllers in simulation that can then be deployed on real humanoid (OP3) and quadrupedal (ANYmal B) robots, respectively. This allowed the robots to be directed by a user via a joystick or to dribble a ball into a target position in a natural and robust manner.
Movement skills for the ANYmal robot are learned by imitating the MoCap dog.
Movement skills can then be reused for controlled walking and ball dribbling.
Benefits of using neural probabilistic prime movers
In summary, we used the NPMP skill model to learn complex tasks with humanoid characters in simulated and real-world robots. NPMP packages low-level movement skills in a reusable way, making it easy to learn useful behaviors that would be difficult to discover through unstructured trial and error. By using movement perception as a source of prior information, it biases the learning of motor control toward that of naturalistic movements.
NPMP allows embedded agents to learn faster using RL. to learn more naturalistic behaviors. to learn safer, efficient and stable behaviors suitable for real-world robotics. and combine whole-body control with broader cognitive skills such as teamwork and coordination.
Learn more about our work: