Research
In July 2022, we published AlphaFold protein structure predictions for nearly all proteins known to science. Read the last blog here.
Today, I am incredibly proud and excited to announce that DeepMind is making a significant contribution to humanity’s understanding of biology.
When we announced AlphaFold 2 last December, it was hailed as a solution to the 50-year-old problem of protein folding. Last week we published it scientific project and Source Code explaining how we created this highly innovative system and are sharing it today high quality predictions for the shape of every single protein in the human body, as well as the proteins of 20 additional organisms that scientists rely on for their research.
As researchers search for cures for disease and pursue solutions to other major problems facing humanity – including antibiotic resistance, microplastic pollution and climate change – they will benefit from new insights into protein structure. Proteins are like tiny exquisite biological machines. In the same way that the structure of a machine tells you what it does, the structure of a protein helps us understand its function. Today, we share a treasure trove of information which is doubled humanity’s understanding of the human proteomeand reveals the protein structures found in 20 other biologically important organisms, from E.coli to yeast and from the fruit fly to the mouse.
This will be one of the most important datasets since mapping the human genome.
Ewan Birney, Deputy Director General of EMBL and Director of EMBL-EBI
As a powerful tool supporting the efforts of researchers, we believe this is the most significant contribution AI has made to the advancement of scientific knowledge to date, and is a great example of the benefits AI can bring to humanity. These insights will underpin many exciting future advances in our understanding of biology and medicine. Thanks to five years of tireless work and a lot of ingenuity from the AlphaFold team and the close collaboration over the last months with our partners at EMBL European Bioinformatics Institute (EMBL-EBI)we are able to share this vast and valuable resource with the world.
Proteins are exquisite biological machines, their three-dimensional structures often aesthetically pleasing as well as functionally critical as building blocks of life.
This latest project is based on announcements we did last December, at the CASP14 conference, when DeepMind presented a radical new version of the AlphaFold system, which was recognized by the organizers of the assessment as a solution to the 50-year grand challenge of understanding the three-dimensional structure of proteins. Determining protein structures experimentally is a time-consuming and painstaking pursuit, but AlphaFold showed that AI could accurately predict the shape of a protein, at scales and minutes down to atomic precision. In the CASPwe are committed to sharing our methods and providing broad access to this body of knowledge.
Improvements in median prediction accuracy in the free modeling class for the best group in each CASP, measured as better than 5 GDT.
This month, we completed a tremendous amount of hard work to deliver on that commitment. We published two peer-reviewed articles on Nature (1,2) and open source AlphaFold code. Today, in collaboration with EMBL-EBIwe are incredibly proud to launch it AlphaFold Protein Structure Databasewhich offers the most complete and accurate picture of the human proteome to date, more than doubling humanity’s accumulated knowledge of high-precision human protein structures.
In addition to the human proteome (all ~20,000 proteins expressed in the human genome), we provide open access to the proteomes of 20 other biologically important organisms, totaling over 350,000 protein structures. Research into these organisms has been the subject of countless research papers and numerous important discoveries, and has led to a deeper understanding of life itself. In the coming months we plan to greatly expand the coverage in almost every sequenced protein known to science – over 100 million constructions covering most of it UniProt reference database. It is a veritable protein almanac of the world. Both the system and the database will be updated periodically as we continue to invest in future improvements to AlphaFold.
Most excitingly, in the hands of scientists around the world, this new protein almanac will enable and accelerate research that will advance our understanding of these building blocks of life. Already, through our early collaborations, we have seen promising messages from researchers using AlphaFold in their work. For example, the Medicines for Neglected Diseases Initiative (DNDi) has advanced their research into life-saving treatments for diseases that disproportionately affect the poorest parts of the world and the Enzyme Innovation Center at the University of Portsmouth (CEI) is using AlphaFold to help build faster enzymes to recycle some of the most polluting single-use plastics. For those scientists who rely on experimental protein structure determination, AlphaFold’s predictions have helped speed up their research. As another example, a group at University of Colorado Boulder finds promise in using AlphaFold predictions to study antibiotic resistance, while a team at University of California San Francisco has used them increase their understanding of the biology of SARS-CoV-2. And this is just the beginning of what we hope will be a revolution in structural bioinformatics. With AlphaFold out into the world, there is a wealth of data now waiting to be turned into future developments.
AlphaFold opens up new research horizons, and it’s encouraging to see powerful cutting-edge AI enabling work on diseases that are concentrated almost exclusively in poor populations.
Ben Perry, Discovery Open Innovation Leader, Drugs for Neglected Diseases Initiative (DNDi)
For the AlphaFold team at DeepMind, this work represents the culmination of five years of tremendous effort, including creatively overcoming many challenging setbacks, resulting in a series of new sophisticated algorithmic innovations that were all necessary to finally solve the problem. It builds on the discoveries of generations of scientists, from the early pioneers of protein imaging and crystallography, to the thousands of prediction experts and structural biologists who have spent years experimenting with proteins since then. Our dream is that AlphaFold, by providing this fundamental understanding, will help countless more scientists in their work and open entirely new avenues of scientific discovery.
What took us months and years to do, AlphaFold was able to do in a weekend.
Professor John McGeehan, Professor of Structural Biology and Center Director, Center for Enzyme Innovation (CEI) at the University of Portsmouth
At DeepMind, our thesis has always been that AI can dramatically accelerate discoveries in many areas of science and in turn advance humanity. We built AlphaFold and AlphaFold Protein Structure Database to support and elevate the efforts of scientists around the world in the important work they are doing. We believe that AI has the potential to revolutionize the way science is done in the 21st century, and we look forward to the discoveries that AlphaFold can help the scientific community unlock next.
To learn more, head over to Nature to read our scientific papers describing ours complete methodand human proteome. You can read more about them in our technical blog. If you want to explore our system, here it is open source code to AlphaFold and Colab Notebook to run individual sequences. To explore our structures, EMBL-EBI, the world leader in biological data, hosts them a searchable database which is open and free to all.
We’d love to hear your feedback and understand how useful AlphaFold has been in your research. Share your stories at alphafold@deepmind.com.