Contributing Authors
Table of Contents
Here is something most enterprise security teams have not considered: when a developer sends a message through an AI coding assistant, neither the request nor the response is cryptographically signed. OpenAI does not sign its API responses. Anthropic signs extended thinking blocks but not text or tool-use blocks [1]. There is no mechanism, no HMAC, no digital signature, no content attestation, for the client to verify that a streaming response actually came from the model rather than being fabricated by something sitting between the developer and the API.
That “something” is usually an AI API gateway: the proxy that manages keys, enforces policies, and abstracts away provider differences. Eighty-four percent of developers now use AI coding tools, and 51% use them daily [2]. Gartner projects 90% adoption by 2028 [3]. As adoption has scaled, organizations have funneled this traffic through centralized gateways, creating a single point of interception that holds every API key, touches every request, and can modify every response.
We built a gateway that does exactly that.
Over the past several weeks, our team constructed a proof-of-concept attack chain that begins with a compromised AI API gateway and ends with a persistent remote shell on a developer’s machine. The AI model never sees the attack. The developer sees what appears to be a normal response. The gateway fabricates a tool call, the coding agent executes it silently, and within seconds the attacker has a browser-based terminal to the victim’s workstation.
To our knowledge, this is the first public demonstration of this specific attack class: protocol-level tool call fabrication at the gateway layer, achieving remote code execution on a coding agent’s host without any prompt injection involved. The model is never tricked. The model is never consulted. The attack operates entirely at the transport layer.
This Is Not Prompt Injection
The security community has spent considerable effort on prompt injection — tricking a model into doing something it shouldn’t. NVIDIA demonstrated it achieving RCE on developer machines at Black Hat USA 2025 [4]. The paper “Your AI, My Shell” tested 314 attack payloads against Cursor and GitHub Copilot, achieving 83.4% success rates in auto-approve mode [5]. A
meta-analysis of 78 studies found that attack success rates against state-of-the-art defenses exceed 85% with adaptive strategies [6].
Our attack is different in kind, not degree. Prompt injection manipulates the model’s input to influence its output. Our attack never touches the model. The gateway intercepts the API response stream and replaces it with fabricated SSE events containing a tool-use block that the coding agent parses and executes as if it came from the model. The agent has no way to distinguish a legitimate tool_use block generated by Claude from one injected by a proxy —
because both arrive as identical JSON over the same TLS connection.
This is the confused deputy problem in its most direct form [7]. The agent isn’t confused because the model was manipulated — it’s confused because the protocol offers no provenance guarantee. A recent academic survey frames this as the shift “from prompt injections to protocol exploits,” identifying transport-layer attacks as a distinct and underexplored threat class [8]. The OWASP Top 10 for LLM Applications covers prompt injection (#1), supply chain vulnerabilities (#3), and excessive agency (#6) — but has no entry
for gateway compromise or response fabrication [9]. As one researcher noted after mapping eight distinct AI gateway attack vectors: “Every major security framework missed it. OWASP LLM Top 10, MITRE ATLAS, NIST AI RMF — none of them cover the gateway infrastructure layer.” [10]
Why LiteLLM
We chose LiteLLM as our gateway engine because it is among the most widely deployed open-source AI proxies, averaging 3.6 million PyPI downloads per day [11]. It also demonstrates, in concentrated form, what happens when critical infrastructure accumulates vulnerabilities faster than the community can remediate them.
LiteLLM has disclosed 14 CVEs since April 2024 — two Critical (CVSS 9.8), nine High — spanning SSRF, RCE, SQL injection, server-side template injection, and API key leakage [12]:
CVE-2024-6587 (CVSS 7.5): The api_base parameter forwards requests — with API keys in the Authorization header — to the attacker-controlled servers.
CVE-2024-6825 (CVSS 8.8): the post_call_rules config accepts os.systems as a callback, enabling arbitrary command execution on every response. A bypass via pty.spawn() remained unpatched through version 1.81.0, with the vendor declining to acknowledge it [13].
CVE-2024-9606 (CVSS 7.5): API key masking that only redacts the first five characters, leaking nearly the entire key in logs.
Then, five days ago, it got worse. On March 24, 2026, the threat actor group TeamPCP published backdoored versions of LiteLLM (1.82.7 and 1.82.8) to PyPI using credentials stolen through a poisoned GitHub Action in LiteLLM’s own CI/CD pipeline — part of a coordinated campaign spanning five ecosystems in a single week [14].
The payload was a three-stage credential stealer: SSH keys, cloud credentials, Kubernetes secrets, cryptocurrency wallets, and shell history — AES-256-CBC encrypted and exfiltrated to a domain registered one day prior. Version 1.82.8 installed a .pth file that executed on every Python interpreter startup, not just when LiteLLM was imported. The packages were live for approximately five hours. Over 600 public GitHub projects had unpinned dependencies, and
major AI frameworks issued emergency patches the same day. LiteLLM engaged Google’s Mandiant for forensic analysis; the full scope of credential theft remains unknown.
As Kaspersky’s analysis put it: “LLM proxies occupy an extraordinarily privileged position in the AI stack as they sit between applications and model providers, routing all traffic and holding credentials for multiple providers simultaneously.” [15].
What Our Attack Chain Does
A Pattern, Not an Anomaly
Every major AI-integrated development tool has been compromised in the past twelve months:
– Microsoft 365 Copilot (CVE-2025-32711): A zero-click prompt injection enabled a single crafted email to exfiltrate OneDrive files, SharePoint content, and Teams messages with no user interaction.
– GitHub Copilot (CVE-2025-53773): Prompt injection wrote autoApprove: true to VS Code settings, silently enabling unrestricted command execution — with potential for self-propagating “AI worms” across repositories.
– Claude Code: Check Point Research demonstrated RCE and API key exfiltration through malicious project configuration files.
– Ray AI framework: Over 230,000 publicly exposed clusters compromised, with attackers deploying cryptominers and data theft malware.
IBM’s 2025 Cost of a Data Breach Report quantified the consequences: breaches involving shadow AI — unsanctioned AI tools adopted without IT oversight — cost $4.63 million on average, $670,000 more than standard incidents. One in five organizations reported such a breach; 97% of those breached lacked proper AI access controls [17]. Gartner predicts that by
2028, 25% of enterprise breaches will be traced to AI agent abuse [18].
As OpenAI’s CISO Dane Stuckey acknowledged: “Prompt injection remains a frontier, unsolved security problem, and our adversaries will spend significant time and resources to find ways to make [AI agents] fall for these attacks.” [19] Our work shows that adversaries don’t even need to solve prompt injection. They just need to sit in the middle.
What Enterprises Must Do
NIST AI RMF — do not specifically address gateway-layer threats, and every published shared responsibility model places the gateway squarely in the customer’s domain [21].
Treat your AI gateway as critical infrastructure. It holds every API key and touches every request. It deserves the same security scrutiny as your identity provider or CI/CD pipeline.
Closing
We built this proof of concept to demonstrate something the security community has not yet fully reckoned with: the most dangerous attack against an AI coding agent doesn’t require tricking the model at all. It requires controlling the wire. No prompt engineering, no jailbreak, no adversarial suffix — just a proxy that rewrites JSON.
The question is no longer whether your AI stack can be compromised. Given 14 CVEs, a supply chain attack, 42,000 exposed instances, 800 malicious skills, and unsigned API responses — all in production systems, all in the past year — the question is whether you’d know if it already had been.
References
1. Anthropic signs extended thinking block content for multi-turn verification but does not sign text or tool_use blocks. See Anthropic Extended Thinking Documentation. OpenAI signs outbound Operator requests (Castle.io) but not API responses. No AI provider offers client-verifiable response integrity for standard API calls.
2. Stack Overflow 2025 Developer Survey, via Fortune
3. Gartner Press Release, April 2024
4. NVIDIA: From Prompts to Pwns — Black Hat USA 2025
5. Liu et al., “Your AI, My Shell,” arXiv:2509.22040 (2025)
6. Maloyan & Namiot, “SoK: Prompt Injection Attacks on Agentic Coding Assistants,” arXiv:2601.17548 (2025)
7. Quarkslab, “Agentic AI: The Confused Deputy Problem” (January 2026)
8. “From Prompt Injections to Protocol Exploits”, arXiv:2506.23260 (December 2025)
9. OWASP Top 10 for LLM Applications 2025
10. Kai Aizen, “AI Gateway Threat Model: 8 Attack Vectors” (February 2026)
11. CyberInsider
12. SecUtils LiteLLM CVE Tracker; CVEDetails
13. NSIDE-SA-2026-002: LiteLLM RCE Fix Bypass
14. Snyk; Sonatype; Datadog; Wiz; Arctic Wolf
15. Kaspersky/Securelist: “Why is the LiteLLM AI Gateway Compromise So Dangerous?”
16. MITRE ATLAS OpenClaw Investigation (February 2026)
17. IBM 2025 Cost of a Data Breach Report; VentureBeat
18. Gartner: 25% of Enterprise Breaches from AI Agent Abuse by 2028
21. Microsoft Azure AI Shared Responsibility Model; CSA AI Controls Matrix
22. AEGIS: A Pre-Execution Firewall for AI Agents, arXiv:2603.12621 (March 2026)