Computational models that mimic the structure and function of the human auditory system could help researchers design better hearing aids, cochlear implants and brain-machine interfaces. A new study from MIT found that modern computational models derived from machine learning come closer to this goal.
In the largest study to date of deep neural networks trained to perform auditory tasks, the MIT team showed that most of these models create internal representations that share the properties of the representations that appear in the human brain when people hear the same sounds .
The study also offers insights into how best to train this type of model: The researchers found that models trained on auditory input, including ambient noise, more closely mimic the activation patterns of the human auditory cortex.
“What sets this study apart is that it is the most comprehensive comparison of these types of models to the auditory system to date. The study suggests that machine learning-derived models are a step in the right direction and gives us some clues about what tends to make them better models of the brain,” says Josh McDermott, associate professor of brain and cognition. of Sciences at MIT, a member of MIT’s McGovern Institute for Brain Research and Center for Brains, Minds, and Machines, and the study’s senior author.
MIT graduate student Greta Tuckute and Jenelle Feather PhD ’22 are the lead authors of Open Access paper, appearing today at PLOS Biology.
Hearing models
Deep neural networks are computational models consisting of many layers of information processing units that can be trained on massive amounts of data to perform specific tasks. This type of model has been widely used in many applications, and neuroscientists have begun to explore the possibility that these systems can also be used to describe how the human brain performs certain tasks.
“These models built with machine learning are able to mediate behaviors on a scale that really wasn’t possible with previous types of models, and this has led to interest in whether the representations in the models could capture things that happen in the brain.” , says Tuckute.
When a neural network performs a task, its processing units produce firing patterns in response to each audio input it receives, such as a word or other type of sound. These model representations of the input can be compared to the activation patterns seen in fMRI brain scans of subjects listening to the same input.
In 2018, McDermott and then-graduate student Alexander Kell reported that when they trained a neural network to perform auditory tasks (such as recognizing words from an audio signal), the internal representations created by the model showed similarity to those observed in fMRI scans of people hearing the same sounds.
Since then, these types of models have become widely used, so McDermott’s research team set out to evaluate a larger set of models, to see if the ability to approximate the neural representations seen in the human brain is a general characteristic of these models.
For this study, the researchers analyzed nine publicly available deep neural network models trained to perform auditory tasks and also created 14 of their own models, based on two different architectures. Most of these models were trained to perform a single task — word recognition, speaker recognition, environmental sounds recognition, and music genre recognition — while two of them were trained to perform multiple tasks.
When the researchers presented these models with natural sounds that had been used as stimuli in human fMRI experiments, they found that the models’ internal representations tended to resemble those produced by the human brain. The models whose representations most resembled those of the brain were models trained on more than one task and trained on auditory input that included background noise.
“If you train models in noise, they give better predictions of the brain than if you don’t, which makes intuitive sense because a lot of real hearing involves listening in noise, and that’s arguably something the auditory system is adapted to.” , Feather says.
Hierarchical processing
The new study also supports the idea that the human auditory cortex has some degree of hierarchical organization, in which processing is divided into stages that support distinct computational functions. As in the 2018 study, the researchers found that representations created at earlier stages of the model are more similar to those that appear in the primary auditory cortex, while representations created at later stages of the model are more similar to those created in areas of brain beyond the primary cortex.
In addition, the researchers found that models trained on different tasks were better at reproducing different aspects of listening. For example, models trained on a speech-related task looked more like speech-selective regions.
“Even though the model has seen exactly the same training data and the architecture is the same, when you optimize for a specific task, you can see that it selectively explains specific tuning properties in the brain,” says Tuckute.
McDermott’s lab now plans to use its findings to try to develop models that are even more successful at replicating human brain responses. In addition to helping scientists learn more about how the brain is organized, such models could also be used to help develop better hearing aids, cochlear implants, and brain-machine interfaces.
“One goal of our field is to come up with a computer model that can predict brain reactions and behavior. We believe that if we achieve this goal, we will open many doors,” says McDermott.
Research was funded by the National Institutes of Health, an Amazon Fellowship from Science Hub, an International Doctoral Fellowship from the American Association of University Women, an MIT Friends of McGovern Institute Fellowship, a K. Lisa Yang Integrative Computational Neuroscience (ICoN) Fellowship Center at MIT, and Department of Energy Computational Science Graduate Scholarship.