Our approach to analyzing and mitigating future risks posed by advanced AI models
Google DeepMind has continually pushed the boundaries of artificial intelligence, developing models that have changed our understanding of what is possible. We believe that AI technology on the horizon will provide society with invaluable tools to help address critical global challenges such as climate change, drug discovery and economic productivity. At the same time, we recognize that as we continue to push the boundaries of AI capabilities, these breakthroughs may ultimately come with new risks beyond those posed by current models.
Today, we present ours Border Security Framework – a set of protocols for proactively identifying future AI capabilities that could cause serious harm and creating mechanisms to detect and mitigate them. Our Framework focuses on serious risks arising from strong model-level capabilities, such as exceptional agency capabilities or sophisticated cyber capabilities. It’s designed to complement our alignment research, which trains models to act in accordance with human values and social goals, as well as Google’s existing AI liability and safety suite practices.
The Framework is exploratory and we expect it to evolve significantly as we learn from its implementation, deepen our understanding of AI risks and assessments, and engage with industry, academia and government. Although these risks are beyond the scope of current models, we hope that implementing and improving the Framework will help us prepare to address them. We aim to have this initial framework fully implemented by early 2025.
The frame
The first version of the Framework announced today is based on our research evaluating critical capabilities in frontier models, and follows his emergent approach Responsible Capability Scaling. The Framework has three main components:
- Determining capabilities that a model with potential serious damage may have. To do this, we investigate the pathways through which a model could cause severe harm in high-risk areas, and then determine the minimum level of capability a model must have to play a role in causing such harm. We call these ‘Critical Capability Levels’ (CCL) and they guide our assessment and mitigation approach.
- We periodically evaluate our frontier models to identify when they reach these critical levels of capability. To do this, we will develop series of model evaluations, called “early warning evaluations,” that will alert us when a model is approaching a CCL, and we will run them often enough to have a warning before that threshold is reached.
- Apply a mitigation plan when a model passes early warning assessments. This should take into account the overall balance of benefits and risks, as well as the intended development contexts. These restrictions will mainly focus on security (preventing model penetration) and development (preventing abuse of critical capabilities).
Risk areas and mitigation levels
Our initial set of critical skill levels is based on the exploration of four domains: autonomy, biosecurity, cyber security, and machine learning research and development (R&D). Our initial research indicates that the capabilities of future foundation models are most likely to pose serious risks in these areas.
In terms of autonomy, cybersecurity, and biosecurity, our primary objective is to assess the extent to which threat actors could use a model with advanced capabilities to perform malicious activities with serious consequences. For machine learning R&D, the focus is on whether models with such capabilities would enable the proliferation of models with other critical capabilities or enable rapid and unmanageable scaling of AI capabilities. As we conduct further research in these and other risk areas, we expect that these CCLs will evolve and that several CCLs will be added at higher levels or in other risk areas.
To allow us to tailor the strength of mitigation measures to each CCL, we have also described a set of security and development mitigations. Higher-level security mitigations result in greater protection against infiltration of model weights, and higher-level development mitigations allow tighter management of critical capabilities. These measures, however, may also slow the pace of innovation and reduce the broad accessibility of capabilities. Achieving the optimal balance between mitigating risks and promoting access and innovation is paramount to the responsible development of artificial intelligence. By weighing the overall benefits against the risks and considering the model development and deployment framework, we aim to ensure responsible AI progress that unlocks transformative potential while protecting against unintended consequences.
Investment in science
The research underlying the Framework is nascent and advancing rapidly. We have invested significantly in our Border Security Team, which coordinated the cross-functional effort behind our Framework. Their mission is to advance the science of border risk assessment and improve our Framework based on our improved knowledge.
The team developed an assessment suite for critical capability risk assessment, with a particular focus on LLM autonomous agents, and road-tested it on our state-of-the-art models. Theirs recent paper The description of these assessments also explores mechanisms that could constitute a future “early warning system”. It describes technical approaches to assessing how close a model is to succeeding at a task it currently fails to do, and also includes predictions of future capabilities from a team of expert forecasters.
Staying true to our AI Principles
We will review and evolve the Framework periodically. In particular, as we pilot the Framework and deepen our understanding of risk areas, CCLs and development frameworks, we will continue our work to calibrate specific CCL mitigations.
At the heart of our work are Google Principles of AI, which commit us to pursue extended benefits while mitigating risks. As our systems improve and their capabilities grow, measures like the Frontier Safety Framework will ensure that our practices continue to meet these commitments.
We look forward to working with others across industry, academia and government to develop and improve the Framework. We hope that sharing our approaches will facilitate working with others to agree on standards and best practices for evaluating the security of future generations of AI models.