In order to navigate the world, our brains must develop an intuitive understanding of the physical world around us, which we then use to interpret the sensory information coming to the brain.
How does the brain develop this intuitive understanding? Many scientists believe it may use a process similar to what is known as “self-supervised learning.” This type of machine learning, originally developed as a way to create more efficient models for computer vision, allows computational models to learn about visual scenes based solely on the similarities and differences between them, without labels or other information.
A pair of studies by researchers at the K. Lisa Yang Center for Integrative Computational Neuroscience (ICoN) at MIT offers new evidence to support this hypothesis. The researchers found that when they trained models known as neural networks using a specific type of self-supervised learning, the resulting models produced patterns of activity similar to those seen in the brains of animals performing the same tasks as the models.
The findings show that these models are able to learn representations of the natural world that they can use to make accurate predictions about what will happen in that world, and that the mammalian brain may use the same strategy, the researchers say. researchers.
“The point of our work is that artificial intelligence designed to help build better robots also ends up being a framework for better understanding the brain in general,” says Aran Nayebi, a postdoc at the ICoN Center. “We can’t say yet whether it’s the whole brain, but across scales and different brain regions, our results seem to suggest an organizing principle.”
Nayebi is its main author one of the studies, along with Rishi Rajalingham, a former MIT postdoc now at Meta Reality Labs, and senior authors Mehrdad Jazayeri, associate professor of brain and cognitive sciences and a member of the McGovern Institute for Brain Research. and Robert Yang, assistant professor of brain and cognitive sciences and associate member of the McGovern Institute. Ila Fiete, director of the ICoN Center, professor of brain and cognitive sciences, and associate member of the McGovern Institute, is the senior author of another studyco-led by Mikail Khona, a graduate student at MIT, and Rylan Schaeffer, a former senior research fellow at MIT.
Both studies will be presented at the 2023 Conference on Neural Information Processing Systems (NeurIPS) in December.
Modeling the natural world
Early computer vision models were mainly based on supervised learning. Using this approach, models are trained to classify images that each have a name — cat, car, etc. The resulting models work well, but this type of training requires a lot of human-labeled data.
To create a more efficient alternative, in recent years researchers have turned to models built through a technique known as adversarial self-supervised learning. This type of learning allows an algorithm to learn to classify objects based on how similar they are to each other, without providing external labels.
“This is a very powerful method because now you can take very large modern data sets, especially video, and really unlock their potential,” says Nayebi. “A lot of modern AI that you see now, especially in the last couple of years with ChatGPT and GPT-4, is the result of training a self-supervised objective function on a large-scale data set to get a very flexible representation.”
These types of models, also called neural networks, consist of thousands or millions of processing units connected together. Each node has connections of varying strength to other nodes in the network. As the network analyzes vast amounts of data, the strengths of these connections change as the network learns to perform the desired task.
As the model performs a specific task, the activity patterns of different units within the network can be measured. The activity of each unit can be represented as a firing pattern, similar to the firing patterns of neurons in the brain. Previous work by Nayebi and others has shown that self-monitored models of vision produce activity similar to that seen in the visual processing system of the mammalian brain.
In both new NeurIPS studies, the researchers set out to investigate whether self-supervised computational models of other cognitive functions might also show similarities to the mammalian brain. In the study led by Nayebi, the researchers trained self-supervised models to predict the future state of their environment on hundreds of thousands of naturalistic videos depicting everyday scenarios.
“For the past decade or so, the dominant method for building neural network models in cognitive neuroscience has been to train these networks on single cognitive tasks. But models trained this way rarely generalize to other tasks,” says Yang. “Here we test whether we can build models for some aspect of cognition by first training on naturalistic data using self-supervised learning and then evaluating in laboratory settings.”
Once the model was trained, the researchers had it generalize to a task they call “Mental-Pong.” This is similar to the video game Pong, where a player moves a paddle to hit a ball traveling across the screen. In the Mental-Pong version, the ball disappears just before it hits the paddle, so the player must estimate its trajectory to hit the ball.
The researchers found that the model was able to track the trajectory of the hidden ball with an accuracy similar to that of neurons in the mammalian brain, which had been shown in a previous study by Rajalingham and Jazayeri to simulate its trajectory—a cognitive phenomenon known as a “mental simulation.” Furthermore, the patterns of neural activation observed within the model were similar to those seen in the animals’ brains as they played the game—specifically, in a part of the brain called the dorsofrontal cortex. No other class of computational model couldn’t match the biological data as closely as she did, the researchers say.
“There are a lot of efforts in the machine learning community to build artificial intelligence,” says Jazayeri. “The relevance of these models to neurobiology depends on their ability to further capture the inner workings of the brain. “The fact that Aran’s model predicts neural data is very important as it suggests that we may be getting closer to building artificial systems that mimic natural intelligence.”
Navigating the world
The study led by Khona, Schaeffer and Fiete focused on a type of specialized neurons known as grid cells. These cells, located in the entorhinal cortex, help animals navigate by working with cells located in the hippocampus.
While place cells are activated whenever an animal is at a particular location, grid cells are activated only when the animal is at one of the vertices of a triangular grid. Groups of grid cells create overlapping grids of different sizes, which allow them to encode a large number of positions using a relatively small number of cells.
In recent studies, researchers have trained supervised neural networks to mimic the operation of grid cells by predicting an animal’s next location based on its starting point and speed, a task known as path completion. However, these models relied on access to privileged information about absolute space at all times—information that the animal does not have.
Inspired by the impressive coding properties of the multiperiod grid cell code for space, the MIT team trained an adversarial self-supervised model to both perform the same path integration task and efficiently represent space while doing so. For the training data, they used sequences of velocity inputs. The model learned to distinguish locations based on whether they were similar or different — nearby locations produced similar codes, but further locations produced more different codes.
“It’s similar to image training models, where if two images are both cat heads, their codes should be similar, but if one is a cat’s head and the other a truck, then you want their codes to fight back,” Khona says. “We’re taking the same idea, but applying it to space orbits.”
Once the model was trained, the researchers found that the activation patterns of the nodes within the model formed many lattice patterns with different periods, very similar to those formed by grid cells in the brain.
“What excites me about this work is that it makes connections between mathematical work on the impressive information-theoretic insights of the grid cell code and the computation of path integration,” says Fiete. “While the math work was detailed — what properties does the grid cell code have? — the approach of optimizing encoding efficiency through self-supervised learning and grid-like tuning acquisition is synthetic: It shows which properties may be necessary and sufficient to explain why the brain has grid cells.’
The research was funded by the K. Lisa Yang ICoN Center, the National Institutes of Health, the Simons Foundation, the McKnight Foundation, the McGovern Institute, and the Helen Hay Whitney Foundation.