AI Security

Google Workspace's Layered Defense Against Indirect Prompt Injection

Google's GenAI Security Team has published how it defends Gemini inside Workspace from indirect prompt injection — treating it as a standing threat class rather than a bug to patch once.

PyramidLedger Research2 July 20264 min read

Indirect prompt injection (IPI) hides malicious instructions inside content an AI reads — emails, docs, web pages — rather than in the user's own prompt.
Google's defense for Workspace's Gemini integrations layers deterministic controls (confirmations, URL sanitization, tool-chaining limits) with ML classifiers and model-level hardening.
Google says it scaled adversarial training-data generation for its classifiers by roughly 75% using an internal tool called Simula, feeding continuous red-team and vulnerability-reward findings.
Google explicitly frames IPI as unsolved and ongoing — the goal is raising attacker cost and complexity, not eliminating the risk.

On 2 April 2026, Google's GenAI Security Team published a detailed account, written by Adam Gavish, of how Workspace defends against indirect prompt injection (IPI) — a technique where the attacker never messages the AI directly, but instead plants instructions inside content the AI is likely to read: an email, a shared Doc, a calendar invite. Once Gemini ingests that content while completing a legitimate user task, the injected text can be interpreted as a command rather than as data. The post is notable for its framing: Google describes IPI as a threat class it manages continuously, not a bug it expects to close out.

Why indirect prompt injection is different

Direct prompt injection requires the attacker to control the conversation. IPI needs only a foothold in data or tools an AI agent touches — a poisoned message in an inbox, a hidden instruction in a document, a malicious page a Gemini-powered agent is asked to summarize. As Workspace's Gemini integrations increasingly chain read, search, and write actions across Gmail, Docs, Drive, and Calendar, a single successful injection can reach further than a one-off chatbot jailbreak, which is why Google's post avoids any language of a final fix.

The layered defense stack

Deterministic controls — mandatory user confirmation before sensitive actions, URL/link sanitization, and policies restricting how model-invoked tools can be chained together.
ML-based classifiers trained to detect injected instructions inside untrusted content such as emails and files, retrained on synthetic adversarial data.
Model-level hardening of Gemini itself, improving its ability to recognize and disregard instructions arriving via retrieved data rather than the user.
A centralized policy engine that lets Google push defense updates across Workspace surfaces quickly, without waiting on a full model retrain for every new technique.

Google says it increased the volume of adversarial training data it can generate for these classifiers by roughly 75%, using an internal synthetic-data tool referred to as Simula — a scale increase aimed at keeping the ML defenses ahead of novel injection patterns rather than reacting to known ones after the fact.

Finding attacks before attackers do

Google describes a mix of discovery methods feeding its defenses: human red teams running adversarial simulations against Workspace and Gemini, automated ML-driven red-teaming pipelines, external submissions through Google's AI Vulnerability Reward Program, and monitoring of publicly disclosed techniques — all consolidated into an internal catalog of generative-AI vulnerabilities. Defense efficacy is checked with end-to-end simulations against real Workspace apps such as Gmail and Docs, comparing attack success rates before and after a given mitigation ships.

This approach tracks Google's earlier work in this area: a prior post, Mitigating prompt injection attacks with a layered defense strategy, and DeepMind's Advancing Gemini's security safeguards, both describe defense-in-depth combining model hardening, classifiers, and system-level guardrails — and both stop short of claiming the problem is solved.

What it means for security teams

For organisations running Gemini inside Workspace, the practical takeaway is that IPI defense is layered and probabilistic, not a single switch to flip. Confirmation prompts before sensitive actions — sending mail, sharing a file, running a script — remain a meaningful control precisely because classifiers and model hardening aren't expected to catch everything alone. Teams building or extending Workspace add-ons and agentic automations should treat any AI component with read access to untrusted content — inbound email, shared documents, fetched web pages — with the same 'assume the input is hostile' mindset used for parsing untrusted files, and test their own integrations against injected instructions rather than assuming platform-level defenses cover custom code.

The honest limits of layered defense

Neither Google nor DeepMind claims these layers stop every attack. The stated goal — raising the cost, complexity, and failure rate an attacker faces — is a realistic bar for a threat that depends on distinguishing instructions from data inside natural-language content, something LLMs are not structurally built to do with certainty.

Frequently Asked Questions

What is indirect prompt injection (IPI)?

IPI is a technique where an attacker embeds malicious instructions in data or tools an AI system uses — such as an email, document, or web page — rather than in the user's own prompt. When the AI processes that content to complete a legitimate task, it can interpret the embedded text as a command, sometimes without any direct input from the user.

Does Google's layered defense fully stop indirect prompt injection?

No. Google's post and DeepMind's related work both frame IPI as an ongoing threat class rather than a solved problem. The layered approach — deterministic controls, ML classifiers, and model hardening — aims to raise the cost and complexity of successful attacks, not eliminate the risk entirely.

What should enterprises using Gemini in Workspace do beyond relying on Google's built-in defenses?

Keep confirmation prompts enabled for sensitive actions, treat any AI agent with access to untrusted inbound content (email, shared files, fetched web pages) as a potential injection vector, and specifically test custom Workspace add-ons or automations for prompt-injection resilience rather than assuming platform defenses extend to them.

Sources

1Google Workspace's continuous approach to mitigating indirect prompt injections — Google Online Security Blog
2Mitigating prompt injection attacks with a layered defense strategy — Google
3Advancing Gemini's security safeguards — Google DeepMind