Microsoft researchers established a new type of “jailbreak” attack known as the “Skeleton Key” that exploits generative AI systems and delivers risky or sensitive information. The attack process is to input text to an AI model that instructs it to enhance the encoded security attributes.
One of the cases involved an AI model declining to suggest the recipe for a Molotov Cocktail because it is against the rules to do so. But if the Skeleton Key attack was used, the model recognized that it was enriching its actions, and after that, it provided the working recipe. Similar information can be obtained through the use of search engines; nonetheless, this type of attack would be extremely dangerous if applied to data containing personally identifiable and financial data.

What is an AI jailbreak?
According to the Microsoft blog, “An AI jailbreak is a technique that can cause the failure of guardrails (mitigations). The resulting harm comes from whatever guardrail was circumvented: for example, causing the system to violate its operators’ policies, make decisions unduly influenced by one user, or execute malicious instructions. This technique may be associated with additional attack techniques such as prompt injection, evasion, and model manipulation.”

Vulnerability in LLMs
Skeleton Key attack is effective on most of the currently prevalent generative AI models, including GPT-3. 5, GPT 4-0, Claude 3, Gemini Pro, and Meta Llama 3 70B. Some of the Large Language Models (LLM) like Google’s Gemini, Microsoft’s CoPilot, and OpenAI’s ChatGPT are trained on ‘internet-sized data.’ As such, the latter may contain over a trillion data points, and it may often include people’s names, phone numbers, addresses, account numbers, and personal IDs among various other sensitive information.
Risks of Organization Using AI
One of the blogs stated that “In bypassing safeguards, Skeleton Key allows the user to cause the model to produce ordinarily forbidden behaviours, which could range from the production of harmful content to overriding its usual decision-making rules.”
This paper identified that organizations using AI models are exposed to Skeleton Key attacks if they solely depend on current security mechanisms to block the output of sensitive data. For instance, in the case of the bank that links the chatbot to customers’ details, an attacker can exploit a Skeleton Key attack and penetrate deeper into the bank’s systems by mimicking the mentioned points.
Thus, to avoid such a situation, Microsoft, for example, has proposed to use hard-coded input/output (I/O) filtering and secure monitoring for its prevention and thus exclude the possibility of advancing in the construction of prompts for very dangerous operations which is beyond safe for the system’s configuration. When different AI models are applied to different industries, it is very important to ensure the security of those different models so that any form of attack cannot compromise the data stored in the models.
What are the key differences between large language models (LLMs) and generative AI?