Skip to Content
Home » Blog » AI » AI Security in 2026: Prompt Injection, the Lethal Trifecta, and How to Defend
January 6, 2026

AI Security in 2026: Prompt Injection, the Lethal Trifecta, and How to Defend

Erich Stuntebeck
AI Security in 2026: Prompt Injection, the Lethal Trifecta, and How to Defend

Introduction

Last year saw progress in agentic AI systems unimaginable only a short while ago. Along with these advances, however, have come new classes of security vulnerabilities, and examples of exploits in the wild continue to grow by the day. As the new year kicks off, let’s review what the core issue is, how it’s been exploited, and what you should be doing to protect yourself. 

Prompt Injections – The Core Issue

The core issue remains the same as it has since the first LLMs – models have no ability to reliably distinguish between instructions and data. There is no notion of untrusted content – any content they process is subject to being interpreted as an instruction. To give you a sense of the magnitude of this problem and why it’s not going away anytime soon, OpenAI themselves even published a post recently admitting as much and calling it a frontier security challenge. Their research attempting to solve it goes back several years now, so it’s unlikely this problem goes away anytime soon. 

The Lethal Trifecta

The problem was only magnified in 2025 as we saw more complex agentic systems being put into production that have access to tools like web search. Any ingested content can be interpreted by the model as an instruction. Simon Willison coined the term The Lethal Trifecta to describe the issue: 

  1. Access to private data — The agent can read your emails, documents, and databases 
  2. Exposure to untrusted tokens — The agent processes input from external sources (emails, shared docs, web content) 
  3. Exfiltration vector — The agent can make external requests (render images, call APIs, generate links) 

If your agentic system has all three, it’s vulnerable. Period. 

2025's Notable Attacks

Two of the more high-profile attacks of 2025 fell within this framework. 

EchoLeak — Microsoft 365 Copilot

The first major zero-click agentic vulnerability to hit a production enterprise system. An attacker sends a crafted email to anyone in your organization. When any user later asks Copilot a question, it retrieves the poisoned email, executes the embedded instructions, and exfiltrates sensitive data via an image URL—all without a single click. 

How it worked: 

  1. Attacker sends an email with a hidden prompt injection 
  2. Victim asks Copilot an unrelated question 
  3. Copilot’s RAG system retrieves the malicious email as context 
  4. Embedded instructions tell Copilot to search for sensitive data 
  5. Results are encoded in an image URL request to the attacker’s server 
  6. Browser “loads the image,” sending data to the attacker 

GeminiJack — Google Gemini Enterprise

An attacker shares a Google Doc, sends a calendar invite, or emails someone in your organization. Hidden instructions get indexed by Gemini Enterprise’s RAG system. When any employee runs a routine search, the agent executes those instructions, searches across Gmail/Calendar/Docs for sensitive data, and exfiltrates via—you guessed it—an image URL. This was essentially the same attack as EchoLeak, but applied to Google’s stack instead of Microsoft’s. 

Looking Ahead – A Framework for 2026

Agentic systems aren’t going away, so what can you do to protect yourself and your organization? 

1. Map Your Blast Radius

Before you can secure agentic systems, you need to know:

  • What data sources can your agents access? 
  • What’s the maximum damage if one is compromised? 
  • Who can send content that gets indexed into RAG systems? 

2. Implement the Principle of Least Privilege

Your agent probably doesn’t need access to all of Gmail, all of SharePoint, all of Slack, and all your databases simultaneously. Segment access:

  • Limit data sources to what’s actually needed 
  • Implement per-user or per-role permissions 
  • Avoid “helpful” defaults that grant broad access 

3. Control Exfiltration Vectors

The “lethal” part of the lethal trifecta is often the easiest to address: 

  • Block or heavily restrict external image loading in AI-generated responses 
  • Implement Content Security Policy (CSP) controls 
  • Monitor for unusual patterns of external requests 
  • Consider sandboxing AI-generated output before rendering 

4. Treat Agentic Systems Like Privileged Infrastructure

Agents with data access are effectively privileged users in your environment. Apply the same rigor you’d use for service accounts: 

  • Audit access patterns 
  • Log all queries and responses 
  • Alert on anomalous behavior 
  • Regular security assessments 

5. Monitor the MCP Ecosystem

If you’re using MCP-connected tools: 

  • Audit which MCP servers you’re connecting to 
  • Never expose MCP servers to untrusted networks 
  • Keep mcp-remote and related tooling updated 
  • Review tool descriptions for hidden instructions (tool poisoning) 

Final Thoughts

2025 established that AI security isn’t a theoretical concern—it’s an operational reality. 2026 will bring more of the same, plus new attack surfaces as agentic AI systems gain more autonomy, more tool access, and more integration into critical workflows. Organizations that understand the fundamental patterns—rather than chasing individual branded vulnerabilities—will be better positioned to defend against whatever comes next.