Microsoft details 'Skeleton Key' AI jailbreak.

Microsoft has show up a new type of AI jailbreak attack called ‘Skeleton Key’, which can bypass responsible AI guardrails on multiple AI models. This technique, capable of subverting most security measures built into AI systems, highlights the critical need for strong security measures at all levels of the AI stack.

The Skeleton Key jailbreak uses a multi-turn strategy to trick an AI model into ignoring its built-in safeguards. Once successful, the model cannot distinguish between malicious or unauthorized requests and legitimate requests, effectively giving attackers complete control over what the AI produces.

Microsoft’s research team has successfully tested the Skeleton Key technique on many prominent AI models, such as Meta’s Llama3-70b-instruct, Google’s Gemini Pro, OpenAI’s GPT-3.5 Turbo and GPT-4, Mistral Large, Anthropic’s Claude 3 Opus and Cohere Commander R Plus.

All affected models fully complied with requests for various risk categories, including explosives, biological weapons, political content, self-harm, racism, drugs, performance sex, and violence.

The attack works by instructing the model to increase its behavior instructions, persuading it to respond to any request for information or content while providing a warning if the result may be considered offensive, harmful, or illegal. This approach, known as “Explicit: mandatory command following”, has been shown to be effective in multiple artificial intelligence systems.

“By bypassing safeguards, the Skeleton Key allows the user to cause the model to produce normally prohibited behaviors, which could range from producing harmful content to bypassing normal decision-making rules,” Microsoft explained.

In response to this discovery, Microsoft has implemented several safeguards in its AI offerings, including Copilot AI assistants.

Microsoft says it has also shared its findings with other AI providers through responsible disclosure processes and updated its models managed by Azure AI to detect and block this type of attack using Prompt Shields.

To mitigate the risks associated with Skeleton Key and similar jailbreak techniques, Microsoft recommends a multi-layered approach for AI system designers:

Input filtering to detect and block potentially harmful or malicious inputs
Careful direct engineering system messages to reinforce appropriate behavior
Output filtering to prevent the creation of content that violates security criteria
Abuse monitoring systems trained in counterexamples to identify and moderate repetitive problematic content or behaviors

Microsoft has also updated its own PyRIT (Python Risk Identification Toolkit) to include Skeleton Key, allowing developers and security teams to test their AI systems against this new threat.

The discovery of the Skeleton Key jailbreak technique highlights the ongoing challenges for the security of AI systems as they become more widespread in various applications.

(Photo by Matt Artz)

See also: Think tank calls for AI incident reporting system

Want to learn more about AI and big data from industry leaders? Checkout AI & Big Data Expo takes place in Amsterdam, California and London. The comprehensive event is co-located with other top events including; Intelligent Automation Conference, BlockX, Digital Transformation Weekand Cyber Security & Cloud Expo.

Explore other upcoming corporate tech events and webinars powered by TechForge here.

Labels: ai, artificial intelligence, cyber security, cybersecurity, exploit, jailbreak, microsoft, direct engineering, security, skeleton key, vulnerability

Why harmonize bank statements? Explain the importance and benefits

Que sont les règles métier ? : The wizard is not complete

Training AI music models is about to get very expensive

DataRobot: A Leader in the 2024 Gartner® Magic Quadrant™ for Data Science and Machine Learning Platforms

ERP in procurement and how to use it effectively

Understanding YOLOv5 Loss: A Comprehensive Analysis

Master Advanced Prompt Engineering with LangChain for Context-Aware Language Models

Arduino vs Raspberry Pi: What’s the difference?

Top 20 Generative AI Applications/ Use Cases Across Industries

Top 35+ Finance Interview Questions And Answers

Microsoft details ‘Skeleton Key’ AI jailbreak.

Navigating the Challenges of Big Data Security in the Current Landscape

Training AI music models is about to get very expensive

Nudeitnow Features, Pricing, Details, Alternatives

Productivity vs security: How CIOs and CISOs can see eye to eye

What you need to know about this new Chinese text-to-video AI model

Foundational Generative AI Models Are Like Operating Systems

Understanding the visual knowledge of language models | MIT News

How AI-Driven Innovations Can Help Optimize PCB Materials

The dangers of voice fraud: We can’t detect what we can’t see

Our Picks

Understanding the visual knowledge of language models | MIT News

How AI-Driven Innovations Can Help Optimize PCB Materials

The dangers of voice fraud: We can’t detect what we can’t see

Subscribe to Updates

Microsoft details ‘Skeleton Key’ AI jailbreak.

Related Posts