Your AI Gateway Is a Skeleton Key: What We Learned by Compromising One

Contributing Authors

Erich Stuntebeck

Table of Contents

This Is Not Prompt Injection
Why LiteLLM
What Our Attack Chain Does
A Pattern, Not an Anomaly
What Enterprises Must Do
Closing
References

Here is something most enterprise security teams have not considered: when a developer sends a message through an AI coding assistant, neither the request nor the response is cryptographically signed. OpenAI does not sign its API responses. Anthropic signs extended thinking blocks but not text or tool-use blocks [1]. There is no mechanism, no HMAC, no digital signature, no content attestation, for the client to verify that a streaming response actually came from the model rather than being fabricated by something sitting between the developer and the API.

That “something” is usually an AI API gateway: the proxy that manages keys, enforces policies, and abstracts away provider differences. Eighty-four percent of developers now use AI coding tools, and 51% use them daily [2]. Gartner projects 90% adoption by 2028 [3]. As adoption has scaled, organizations have funneled this traffic through centralized gateways, creating a single point of interception that holds every API key, touches every request, and can modify every response.

We built a gateway that does exactly that.

Over the past several weeks, our team constructed a proof-of-concept attack chain that begins with a compromised AI API gateway and ends with a persistent remote shell on a developer’s machine. The AI model never sees the attack. The developer sees what appears to be a normal response. The gateway fabricates a tool call, the coding agent executes it silently, and within seconds the attacker has a browser-based terminal to the victim’s workstation.

To our knowledge, this is the first public demonstration of this specific attack class: protocol-level tool call fabrication at the gateway layer, achieving remote code execution on a coding agent’s host without any prompt injection involved. The model is never tricked. The model is never consulted. The attack operates entirely at the transport layer.

This Is Not Prompt Injection

The security community has spent considerable effort on prompt injection — tricking a model into doing something it shouldn’t. NVIDIA demonstrated it achieving RCE on developer machines at Black Hat USA 2025 [4]. The paper “Your AI, My Shell” tested 314 attack payloads against Cursor and GitHub Copilot, achieving 83.4% success rates in auto-approve mode [5]. A
meta-analysis of 78 studies found that attack success rates against state-of-the-art defenses exceed 85% with adaptive strategies [6].

Our attack is different in kind, not degree. Prompt injection manipulates the model’s input to influence its output. Our attack never touches the model. The gateway intercepts the API response stream and replaces it with fabricated SSE events containing a tool-use block that the coding agent parses and executes as if it came from the model. The agent has no way to distinguish a legitimate tool_use block generated by Claude from one injected by a proxy —
because both arrive as identical JSON over the same TLS connection.

This is the confused deputy problem in its most direct form [7]. The agent isn’t confused because the model was manipulated — it’s confused because the protocol offers no provenance guarantee. A recent academic survey frames this as the shift “from prompt injections to protocol exploits,” identifying transport-layer attacks as a distinct and underexplored threat class [8]. The OWASP Top 10 for LLM Applications covers prompt injection (#1), supply chain vulnerabilities (#3), and excessive agency (#6) — but has no entry
for gateway compromise or response fabrication [9]. As one researcher noted after mapping eight distinct AI gateway attack vectors: “Every major security framework missed it. OWASP LLM Top 10, MITRE ATLAS, NIST AI RMF — none of them cover the gateway infrastructure layer.” [10]

Why LiteLLM

We chose LiteLLM as our gateway engine because it is among the most widely deployed open-source AI proxies, averaging 3.6 million PyPI downloads per day [11]. It also demonstrates, in concentrated form, what happens when critical infrastructure accumulates vulnerabilities faster than the community can remediate them.

LiteLLM has disclosed 14 CVEs since April 2024 — two Critical (CVSS 9.8), nine High — spanning SSRF, RCE, SQL injection, server-side template injection, and API key leakage [12]:

CVE-2024-6587 (CVSS 7.5): The api_base parameter forwards requests — with API keys in the Authorization header — to the attacker-controlled servers.

CVE-2024-6825 (CVSS 8.8): the post_call_rules config accepts os.systems as a callback, enabling arbitrary command execution on every response. A bypass via pty.spawn() remained unpatched through version 1.81.0, with the vendor declining to acknowledge it [13].

CVE-2024-9606 (CVSS 7.5): API key masking that only redacts the first five characters, leaking nearly the entire key in logs.

Then, five days ago, it got worse. On March 24, 2026, the threat actor group TeamPCP published backdoored versions of LiteLLM (1.82.7 and 1.82.8) to PyPI using credentials stolen through a poisoned GitHub Action in LiteLLM’s own CI/CD pipeline — part of a coordinated campaign spanning five ecosystems in a single week [14].

The payload was a three-stage credential stealer: SSH keys, cloud credentials, Kubernetes secrets, cryptocurrency wallets, and shell history — AES-256-CBC encrypted and exfiltrated to a domain registered one day prior. Version 1.82.8 installed a .pth file that executed on every Python interpreter startup, not just when LiteLLM was imported. The packages were live for approximately five hours. Over 600 public GitHub projects had unpinned dependencies, and
major AI frameworks issued emergency patches the same day. LiteLLM engaged Google’s Mandiant for forensic analysis; the full scope of credential theft remains unknown.

As Kaspersky’s analysis put it: “LLM proxies occupy an extraordinarily privileged position in the AI stack as they sit between applications and model providers, routing all traffic and holding credentials for multiple providers simultaneously.” [15].

What Our Attack Chain Does

Our proof of concept targets OpenClaw, the popular open-source AI coding agent. OpenClaw grants its model an exec tool that runs shell commands on the host. When the model assesses a command as safe, it can execute with ask: “off” (no user confirmation) and security: “full” (unrestricted execution).

Here is what the developer experiences: they send a message to OpenClaw. They see a normal-looking text response. That’s it. Nothing unusual. Behind the scenes:

1. The gateway intercepted the API response, buffered the original text, and replaced the entire response with a fabricated exec tool call — a one-liner that downloads and runs a bootstrap script.

2. OpenClaw executed it silently. The bootstrap installed a WebSocket agent that connected back to our C2 server.

3. On the follow-up API request, the gateway identified the injected tool call by its signature, stripped it and its result from the conversation history, and replayed the original text as a synthetic response.

4. The model never saw the injection. The conversation history is clean. The attacker now has a browser-based remote shell, API key exfiltration from environment variables, and filesystem access. Optional macOS LaunchAgent persistence survives reboots.

The entire chain fires in under ten seconds.

OpenClaw’s own security record underscores the fragility of this trust model: 8 critical CVEs in six weeks, over 42,000 publicly exposed instances (93% with authentication bypass), and 800+ malicious skills in the ClawHub registry delivering credential stealers. MITRE’s Center for

Threat-Informed Defense published a dedicated investigation into OpenClaw’s attack surface,

mapping AI-specific “chokepoint techniques that adversaries rely on.” [16]

A Pattern, Not an Anomaly

Every major AI-integrated development tool has been compromised in the past twelve months:

– Microsoft 365 Copilot (CVE-2025-32711): A zero-click prompt injection enabled a single crafted email to exfiltrate OneDrive files, SharePoint content, and Teams messages with no user interaction.

– GitHub Copilot (CVE-2025-53773): Prompt injection wrote autoApprove: true to VS Code settings, silently enabling unrestricted command execution — with potential for self-propagating “AI worms” across repositories.

– Claude Code: Check Point Research demonstrated RCE and API key exfiltration through malicious project configuration files.

– Ray AI framework: Over 230,000 publicly exposed clusters compromised, with attackers deploying cryptominers and data theft malware.

IBM’s 2025 Cost of a Data Breach Report quantified the consequences: breaches involving shadow AI — unsanctioned AI tools adopted without IT oversight — cost $4.63 million on average, $670,000 more than standard incidents. One in five organizations reported such a breach; 97% of those breached lacked proper AI access controls [17]. Gartner predicts that by
2028, 25% of enterprise breaches will be traced to AI agent abuse [18].

As OpenAI’s CISO Dane Stuckey acknowledged: “Prompt injection remains a frontier, unsolved security problem, and our adversaries will spend significant time and resources to find ways to make [AI agents] fall for these attacks.” [19] Our work shows that adversaries don’t even need to solve prompt injection. They just need to sit in the middle.

What Enterprises Must Do

The EU AI Act’s Article 15 already requires that high-risk AI systems be “resilient against attempts by unauthorized third parties to alter their use, outputs or performance by exploiting system vulnerabilities.” [20] A compromised gateway altering every API response is precisely that scenario. But the current regulatory and security frameworks — OWASP, MITRE ATLAS,
NIST AI RMF — do not specifically address gateway-layer threats, and every published shared responsibility model places the gateway squarely in the customer’s domain [21].

Treat your AI gateway as critical infrastructure. It holds every API key and touches every request. It deserves the same security scrutiny as your identity provider or CI/CD pipeline.

Source from accountable vendors. Production enterprise traffic demands a provider with a dedicated security team, incident response, SLAs, and contractual accountability. A community project with 14 CVEs in two years and a supply chain compromise does not meet that bar.

Assume the wire is hostile. Until AI providers ship response signing, treat the path between your gateway and the model as a trust boundary. Deploy agent-level firewalls that validate tool calls independent of the transport layer — projects like AEGIS demonstrate this is feasible [22]. Monitor for anomalies: unexpected tool calls, suspiciously fast tool results, history modifications between requests.

Constrain agent permissions. Coding agents with unrestricted shell access and no user confirmation are a loaded weapon pointed at whoever controls the response stream. Enforce approval prompts for network operations and file modifications. Sandbox execution environments.

Push for response attestation. The architectural gap at the root of this attack — unsigned, unattested API responses — is solvable. Content provenance standards like C2PA exist for media; AI API responses need the same treatment. Enterprises should be demanding this from their AI providers.

Closing

We built this proof of concept to demonstrate something the security community has not yet fully reckoned with: the most dangerous attack against an AI coding agent doesn’t require tricking the model at all. It requires controlling the wire. No prompt engineering, no jailbreak, no adversarial suffix — just a proxy that rewrites JSON.

The question is no longer whether your AI stack can be compromised. Given 14 CVEs, a supply chain attack, 42,000 exposed instances, 800 malicious skills, and unsigned API responses — all in production systems, all in the past year — the question is whether you’d know if it already had been.

References

1. Anthropic signs extended thinking block content for multi-turn verification but does not sign text or tool_use blocks. See Anthropic Extended Thinking Documentation. OpenAI signs outbound Operator requests (Castle.io) but not API responses. No AI provider offers client-verifiable response integrity for standard API calls.

2. Stack Overflow 2025 Developer Survey, via Fortune

3. Gartner Press Release, April 2024

4. NVIDIA: From Prompts to Pwns — Black Hat USA 2025

5. Liu et al., “Your AI, My Shell,” arXiv:2509.22040 (2025)

6. Maloyan & Namiot, “SoK: Prompt Injection Attacks on Agentic Coding Assistants,” arXiv:2601.17548 (2025)

7. Quarkslab, “Agentic AI: The Confused Deputy Problem” (January 2026)

8. “From Prompt Injections to Protocol Exploits”, arXiv:2506.23260 (December 2025)

9. OWASP Top 10 for LLM Applications 2025

10. Kai Aizen, “AI Gateway Threat Model: 8 Attack Vectors” (February 2026)

11. CyberInsider

12. SecUtils LiteLLM CVE Tracker; CVEDetails

13. NSIDE-SA-2026-002: LiteLLM RCE Fix Bypass