Microsoft researchers have developed a new "Skeleton Key" jailbreak attack that exploits generative AI systems to access sensitive information, bypassing existing security measures. This poses significant risks for organizations using AI models, highlighting the need for robust security strategies.
Microsoft Unveils Skeleton Key Attack Exploiting Generative AI Systems
Microsoft researchers established a new type of “jailbreak” attack known as the “Skeleton Key” that exploits generative AI systems and delivers risky or sensitive information. The attack process is to input text to an AI model that instructs it to enhance the encoded security attributes.
One of the cases involved an AI model declining to suggest the recipe for a Molotov Cocktail because it is against the rules to do so. But if the Skeleton Key attack was used, the model recognized that it was enriching its actions, and after that, it provided the working recipe. Similar information can be obtained through the use of search engines; nonetheless, this type of attack would be extremely dangerous if applied to data containing personally identifiable and financial data.
According to the Microsoft blog, “An AI jailbreak is a technique that can cause the failure of guardrails (mitigations). The resulting harm comes from whatever guardrail was circumvented: for example, causing the system to violate its operators’ policies, make decisions unduly influenced by one user, or execute malicious instructions. This technique may be associated with additional attack techniques such as prompt injection, evasion, and model manipulation.”
Skeleton Key attack is effective on most of the currently prevalent generative AI models, including GPT-3. 5, GPT 4-0, Claude 3, Gemini Pro, and Meta Llama 3 70B. Some of the Large Language Models (LLM) like Google’s Gemini, Microsoft’s CoPilot, and OpenAI’s ChatGPT are trained on ‘internet-sized data.’ As such, the latter may contain over a trillion data points, and it may often include people’s names, phone numbers, addresses, account numbers, and personal IDs among various other sensitive information.
One of the blogs stated that “In bypassing safeguards, Skeleton Key allows the user to cause the model to produce ordinarily forbidden behaviours, which could range from the production of harmful content to overriding its usual decision-making rules.”
This paper identified that organizations using AI models are exposed to Skeleton Key attacks if they solely depend on current security mechanisms to block the output of sensitive data. For instance, in the case of the bank that links the chatbot to customers’ details, an attacker can exploit a Skeleton Key attack and penetrate deeper into the bank’s systems by mimicking the mentioned points.
Thus, to avoid such a situation, Microsoft, for example, has proposed to use hard-coded input/output (I/O) filtering and secure monitoring for its prevention and thus exclude the possibility of advancing in the construction of prompts for very dangerous operations which is beyond safe for the system’s configuration. When different AI models are applied to different industries, it is very important to ensure the security of those different models so that any form of attack cannot compromise the data stored in the models.
What are the key differences between large language models (LLMs) and generative AI?
This post was last modified on June 30, 2024 4:32 am
Rish Gupta is an Indian entrepreneur who serves as the chief executive officer (CEO) of…
Are you looking to advance your engineering career in the field of robotics? Check out…
Artificial intelligence is a topic that has recently made internet users all over the world…
Boost your learning journey with the power of AI communities. The article below highlights the…
Demystify the world of Artificial Intelligence with our comprehensive AI Glossary and Terminologies Cheat Sheet.…
Scott Wu is the co-founder and Chief Executive Officer of Cognition Labs, an artificial intelligence…