Research
Note: This blog was first published on February 2, 2022. After the paper was published in Science on December 8, 2022, we made minor updates to the text to reflect this.
Solving new problems and setting a new benchmark in competitive programming
Creating solutions to unpredictable problems is second nature to human intelligence – the result of critical thinking based on experience. The machine learning community has made tremendous progress in generating and understanding textual data, but progress in solving problems remains limited to relatively simple math and programming problems, or otherwise to retrieving and copying existing solutions.
As part of it DeepMind’s mission to solve intelligence, we created a system called AlphaCode that writes computer programs at a competitive level. AlphaCode achieved an estimated ranking in the top 54% of programming competition entrants by solving novel problems that require a combination of critical thinking, logic, algorithms, coding and natural language understanding.
Published on the cover of Scienceour paper describes AlphaCode, which uses transformer-based language models to generate code at unprecedented scale, then intelligently filters down a small set of promising programs.
We validated our performance using contests hosted on Codeforces, a popular platform that hosts regular competitions that attract tens of thousands of participants from around the world who come to test their coding skills. We selected 10 recent competitions for evaluation, each newer than our training data. AlphaCode ranks roughly at the level of the average competitor, marking the first time an AI code generation system has reached a competitive level of performance in programming competitions.
To help others build on our results, we’ve published our dataset of competitive programming problems and solutions on GitHub, including extensive testing to ensure that programs that pass these tests are correct — a critical feature missing from current datasets. We hope this benchmark will lead to further innovations in problem solving and code generation.
Competitive programming is a popular and challenging activity. hundreds of thousands of developers participate in coding competitions to gain experience and demonstrate their skills in fun and collaborative ways. During competitions, participants are given a series of long problem descriptions and a few hours to write programs to solve them.
Typical problems include finding ways to place roads and buildings within certain constraints or creating strategies to win custom board games. Participants are then ranked primarily based on how many problems they solve. Companies use these contests as recruiting tools, and similar kinds of problems are common in software engineering hiring processes.
The problem-solving abilities required to excel in these competitions are beyond the capabilities of existing AI systems. However, by combining advances in large-scale transformer models (which have recently shown promising code generation capabilities) with large-scale sampling and filtering, we have made significant progress in the number of problems we can solve. We pre-train our model on selected public GitHub code and fit it to our relatively small competitive programming dataset.
At evaluation time, we generate a huge number of C++ and Python programs for each problem, orders of magnitude larger than previous work. We then filter, group and rerank these solutions into a small set of 10 candidate programs that we submit for external evaluation. This automated system replaces competitors’ trial-and-error process of debugging, compiling, passing tests, and finally submitting.
With permission from Codeforces, we evaluated AlphaCode by simulating participation in 10 recent competitions. The impressive work of the competitive programming community has created a domain where it is not possible to solve problems through shortcuts such as copying previous solutions or testing every potentially relevant algorithm. Instead, our model should create new and interesting solutions.
Overall, AlphaCode placed about on par with the average competitor. Although far from winning competitions, this result represents a significant leap in AI problem-solving capabilities, and we hope that our results will inspire the competitive programming community.
For AI to help humanity, our systems must be able to develop problem-solving abilities. AlphaCode ranked in the top 54% in real-world programming competitions, an advance that demonstrates the potential of deep learning models for tasks that require critical thinking. These models elegantly leverage modern machine learning to express solutions to problems as code, harkening back to the symbolic reasoning root of artificial intelligence decades ago. And this is only a beginning.
Exploration in code generation leaves huge room for improvement and implies even more exciting ideas that could help developers improve their productivity and open up the field to people who are not currently writing code. We will continue this exploration and hope that further research will lead to tools to improve programming and bring us closer to a problem-solving AI.
See AlphaCode’s solutions and explore the model at alphacode.deepmind.com