Table of Contents
Introduction
Last year saw progress in agentic AI systems unimaginable only a short while ago. Along with these advances, however, have come new classes of security vulnerabilities, and examples of exploits in the wild continue to grow by the day. As the new year kicks off, let’s review what the core issue is, how it’s been exploited, and what you should be doing to protect yourself.
Prompt Injections – The Core Issue
The core issue remains the same as it has since the first LLMs – models have no ability to reliably distinguish between instructions and data. There is no notion of untrusted content – any content they process is subject to being interpreted as an instruction. To give you a sense of the magnitude of this problem and why it’s not going away anytime soon, OpenAI themselves even published a post recently admitting as much and calling it a frontier security challenge. Their research attempting to solve it goes back several years now, so it’s unlikely this problem goes away anytime soon.
The Lethal Trifecta
The problem was only magnified in 2025 as we saw more complex agentic systems being put into production that have access to tools like web search. Any ingested content can be interpreted by the model as an instruction. Simon Willison coined the term The Lethal Trifecta to describe the issue:
- Access to private data — The agent can read your emails, documents, and databases
- Exposure to untrusted tokens — The agent processes input from external sources (emails, shared docs, web content)
- Exfiltration vector — The agent can make external requests (render images, call APIs, generate links)
If your agentic system has all three, it’s vulnerable. Period.
2025's Notable Attacks
Two of the more high-profile attacks of 2025 fell within this framework.
EchoLeak — Microsoft 365 Copilot
The first major zero-click agentic vulnerability to hit a production enterprise system. An attacker sends a crafted email to anyone in your organization. When any user later asks Copilot a question, it retrieves the poisoned email, executes the embedded instructions, and exfiltrates sensitive data via an image URL—all without a single click.
How it worked:
- Attacker sends an email with a hidden prompt injection
- Victim asks Copilot an unrelated question
- Copilot’s RAG system retrieves the malicious email as context
- Embedded instructions tell Copilot to search for sensitive data
- Results are encoded in an image URL request to the attacker’s server
- Browser “loads the image,” sending data to the attacker
GeminiJack — Google Gemini Enterprise
An attacker shares a Google Doc, sends a calendar invite, or emails someone in your organization. Hidden instructions get indexed by Gemini Enterprise’s RAG system. When any employee runs a routine search, the agent executes those instructions, searches across Gmail/Calendar/Docs for sensitive data, and exfiltrates via—you guessed it—an image URL. This was essentially the same attack as EchoLeak, but applied to Google’s stack instead of Microsoft’s.
Looking Ahead – A Framework for 2026
Agentic systems aren’t going away, so what can you do to protect yourself and your organization?
1. Map Your Blast Radius
Before you can secure agentic systems, you need to know:
- What data sources can your agents access?
- What’s the maximum damage if one is compromised?
- Who can send content that gets indexed into RAG systems?
2. Implement the Principle of Least Privilege
Your agent probably doesn’t need access to all of Gmail, all of SharePoint, all of Slack, and all your databases simultaneously. Segment access:
- Limit data sources to what’s actually needed
- Implement per-user or per-role permissions
- Avoid “helpful” defaults that grant broad access
3. Control Exfiltration Vectors
The “lethal” part of the lethal trifecta is often the easiest to address:
- Block or heavily restrict external image loading in AI-generated responses
- Implement Content Security Policy (CSP) controls
- Monitor for unusual patterns of external requests
- Consider sandboxing AI-generated output before rendering
4. Treat Agentic Systems Like Privileged Infrastructure
Agents with data access are effectively privileged users in your environment. Apply the same rigor you’d use for service accounts:
- Audit access patterns
- Log all queries and responses
- Alert on anomalous behavior
- Regular security assessments
5. Monitor the MCP Ecosystem
If you’re using MCP-connected tools:
- Audit which MCP servers you’re connecting to
- Never expose MCP servers to untrusted networks
- Keep mcp-remote and related tooling updated
- Review tool descriptions for hidden instructions (tool poisoning)
Final Thoughts
2025 established that AI security isn’t a theoretical concern—it’s an operational reality. 2026 will bring more of the same, plus new attack surfaces as agentic AI systems gain more autonomy, more tool access, and more integration into critical workflows. Organizations that understand the fundamental patterns—rather than chasing individual branded vulnerabilities—will be better positioned to defend against whatever comes next.