Confirmed Gmail Security Vulnerability Remains Unpatched by Google

Staff
By Staff 6 Min Read

The integration of AI, specifically Google’s Gemini, into Workspace apps like Gmail has introduced significant usability improvements but also raised security concerns. Researchers have demonstrated how Gemini is vulnerable to indirect prompt injection attacks, where malicious prompts embedded in documents, emails, or websites can manipulate the AI into producing unintended responses. This vulnerability allows third parties to potentially compromise the integrity of Gemini’s outputs across platforms like Gmail, Google Slides, and Google Drive, opening avenues for phishing attempts and manipulation. Despite researchers reporting these vulnerabilities, Google has classified them as “Intended Behavior” and opted not to address them as security issues, raising questions about the company’s approach to AI security. This stance has prompted some users to consider disabling smart features or opting out of AI interaction with their emails due to privacy and security concerns.

The indirect prompt injection vulnerability stems from the nature of large language models like Gemini, which can be manipulated to generate misleading or unintended outputs. The “indirect” nature of the attack lies in the delivery mechanism; the malicious prompts are not directly inputted by the user but are instead embedded within seemingly innocuous channels like emails or documents. When a user interacts with these infected files or emails, the embedded prompts trigger Gemini to produce compromised responses, effectively allowing attackers to control the AI’s output without the user’s awareness. Proof-of-concept examples have demonstrated successful phishing attempts via Gemini in Gmail, data tampering in Google Slides, and data poisoning in Google Drive, both locally and within shared documents. These successful attacks highlight the vulnerability of the Workspace suite and raise serious concerns about the integrity of data processed by Gemini-integrated apps.

Beyond indirect prompt injections, a novel attack called the “link trap” further exposes vulnerabilities in LLMs. This attack doesn’t rely on granting extensive permissions to the AI; instead, it manipulates the AI to generate a seemingly harmless link that, when clicked, leaks sensitive data to the attacker. The link trap prompt injection works by embedding malicious instructions within a seemingly benign user query. The AI, following the injected instructions, returns a link controlled by the attacker. This link, often disguised with reassuring text like “reference,” can be designed to exfiltrate sensitive data when clicked. The attack bypasses traditional permission-based mitigations, as the data leakage occurs due to the user’s inherent permissions, not the AI’s. The link trap’s insidious nature lies in its ability to extract data even from private AI instances with limited external connectivity, highlighting the need for increased user awareness and preventative measures.

Furthermore, the “Bad Likert Judge” attack demonstrates another novel method for bypassing LLM safety guardrails. This multi-turn jailbreaking technique exploits the AI’s ability to evaluate the harmfulness of responses. The attacker prompts the LLM to judge responses based on the Likert scale, a rating system for agreement or disagreement. Subsequently, the LLM is prompted to generate responses corresponding to different Likert scale values. This process effectively guides the LLM to produce increasingly harmful content, as the responses with higher Likert scores are more likely to contain the malicious content sought by the attacker. This technique has demonstrated a significant increase in attack success rates compared to traditional prompt injection methods, raising concerns about the efficacy of current LLM safety mechanisms.

The Bad Likert Judge attack unfolds in a three-step process. First, the “Evaluator Prompt” establishes the LLM as a judge of response harmfulness. Second, the “Asking For Harmful Content Generation” stage prompts the LLM to produce responses corresponding to different levels of harm, as measured by the Likert scale. Finally, the “Following Up For The Ultimate AI Jailbreak” stage refines the most harmful response by prompting the LLM to elaborate or add details, further increasing the likelihood of generating malicious content. This multi-step approach effectively manipulates the LLM into bypassing its safety guardrails and producing potentially harmful output. While the researchers anonymized the specific LLMs tested, the Bad Likert Judge technique underscores the need for robust content filtering mechanisms and responsible AI operation.

Google, in response to these concerns, maintains that it has implemented robust defenses against prompt injection attacks and harmful responses. They emphasize that these vulnerabilities are not unique to Gemini and are prevalent across the LLM industry. Google asserts that it conducts rigorous internal and external security testing, including red-teaming exercises and a Vulnerability Rewards Program with specific criteria for AI bug reports. They also highlight existing safeguards like spam filters and input sanitization within Gmail and Drive as mitigating factors against malicious code injection. However, the continued classification of indirect prompt injection as “Intended Behavior” raises questions about the effectiveness of these measures and the potential risks faced by users who rely on Gemini-integrated applications. The ongoing debate highlights the complex interplay between AI usability, security, and user privacy in the evolving landscape of AI-powered tools.

Share This Article
Leave a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *