Research
Over millennia, humanity has discovered, evolved and accumulated a wealth of cultural knowledge, from navigational routes to mathematics and social norms to works of art. Cultural transmission, defined as the effective transmission of information from one person to another, is the process of inheritance that underlies this exponential increase in human abilities.
Our agent, in blue, imitates and remembers the demonstration of both bots (left) and humans (right), in red.
For more videos of our agents in action, visit our page Website.
In this work, we use deep reinforcement learning to create artificial agents capable of culturally transmitting in a test time. Once trained, our agents can infer and recall expert-proven navigation knowledge. This knowledge transfer happens in real-time and generalizes to a vast space of previously unseen tasks. For example, our agents can quickly learn new behaviors by observing a single human demonstration, without ever being trained on human data.
Overview of our supportive learning environment. Tasks are navigational proxies for a broad class of human skills, which require specific sequences of strategic decisions, such as cooking, wayfinding, and problem solving.
We train and test our agents in 3D procedurally generated worlds containing colorful, spherical targets embedded in a noisy obstacle-filled terrain. A player must navigate the goals in the correct order, which changes randomly in each episode. Since the order is impossible to guess, a naive exploration strategy incurs a large penalty. As a source of culturally transmitted information, we provide a privileged ‘bot’ that always inserts the targets in the correct order.
Our MEDAL(-ADR) agent overcomes excisions in held-out tasks in unobstructed (top) and obstructed (bottom) worlds.
Through abstraction, we identify a minimally sufficient “starter set” of training components required for the emergence of cultural transmission, called MEDAL-ADR. These components include memory (M), expert abandonment (ED), attentional bias towards the expert (AL), and automatic domain randomization (ADR). Our dealer outperforms excisions, including the state-of-the-art (ME-AL) method, in a range of demanding jobs. Cultural transmission generalizes from the distribution surprisingly well, and the agent remembers the demonstrations long after the expert has left. Examining the agent’s brain, we find impressively interpretable neurons responsible for encoding social information and goal states.
Our agent generalizes outside the training distribution (top) and has single neurons encoding social information (bottom).
In summary, we provide a procedure for training an agent capable of flexible, high-recall, real-time cultural transmission without using human data in the training pipeline. This paves the way for cultural evolution as an algorithm for the development of more generally intelligent artificial agents.
These authors’ notes are based on joint work by the Cultural General Intelligence Group: Avishkar Bhoopchand, Bethanie Brownfield, Adrian Collister, Agustin Dal Lago, Ashley Edwards, Richard Everett, Alexandre Fréchette, Edward Hughes, Kory W. Mathewson, Piermaria Mendolic Oliveira, Julia Pawar, Miruna Pîslar, Alex Platonov, Evan Senter, Sukhdeep Singh, Alexander Zacherl, and Lei M. Zhang.
Read the whole newspaper here.