Etairos.ai

Posted on Jun 12 • Originally published at thehackernews.com

Agentjacking: AI Coding Agents Tricked Into Running Malicious Code via Sentry Injection

#cybersecurity #infosec #security

TL;DR

what: Attackers inject crafted markdown into Sentry error events that AI coding agents interpret as legitimate diagnostic instructions and execute with developer privileges.
impact: Exposes Git credentials, environment variables, private repository URLs, and enables arbitrary code execution on developer machines with full user privileges while bypassing all security controls.
fix: Sentry activated a global content filter for specific payload strings but acknowledges the architectural flaw is 'technically not defensible'; organizations should audit DSN exposure and restrict AI agent MCP connections.
who: Development teams using AI coding agents (Claude Code, Cursor) with Sentry integration via Model Context Protocol are at immediate risk.

Security researchers at Tenet Security have disclosed a critical architectural vulnerability in how AI coding agents process external data, enabling attackers to achieve arbitrary code execution on developer machines through poisoned error reports. The attack, dubbed Agentjacking, exploits the trust relationship between Sentry's error-tracking platform and AI agents that consume its data via Model Context Protocol (MCP).

The impact is immediate and quantifiable: Tenet identified 2,388 organizations with exposed, injectable Sentry DSNs. In controlled testing against over 100 organizations, the attack achieved an 85% exploitation success rate across widely-used AI coding assistants including Claude Code and Cursor. The attack requires no phishing, no server compromise, and leaves no detectable malicious traffic.

Attack Mechanics: Weaponizing the Trust Chain

Agentjacking exploits a fundamental flaw at the intersection of Sentry's permissive event ingestion and AI agents' implicit trust in MCP-connected services. The attack leverages Sentry Data Source Names (DSNs)—public, write-only credentials embedded in websites for error reporting—as the initial attack vector.

According to researchers Ron Bobrov, Barak Sternberg, and Nevo Poran, the vulnerability exists because AI agents cannot distinguish between legitimate error events generated by actual application crashes and attacker-injected events. When an agent queries Sentry via MCP, it treats all returned data as trusted system output, creating a direct pathway to code execution.

Six-Step Exploitation Chain

The attack unfolds through a precise sequence that exploits the automation and trust inherent in AI-assisted development workflows:

Attacker locates a target organization's Sentry DSN from public sources (embedded in websites, client-side code)
Attacker crafts a malicious error event with carefully formatted markdown in the message field and context key names
Attacker sends the poisoned event to Sentry's ingest endpoint via POST request using the victim's DSN
When the Sentry MCP server returns this event to an AI agent, it renders as structured content visually identical to Sentry's legitimate system template
Developer issues a routine prompt like 'fix unresolved Sentry issues' to their AI coding agent
Agent executes the embedded malicious code with the developer's full system privileges

⚠️ Complete Security Control Bypass — Agentjacking bypasses EDR, WAF, IAM, VPN, Cloudflare, and firewalls because every action in the chain is authorized. The attacker never touches victim infrastructure—the malicious instruction arrives disguised as legitimate error resolution guidance that the agent executes as trusted diagnostic steps.

Data Exposure and Privilege Escalation

A successful Agentjacking attack exposes the full scope of developer access without requiring credential theft. Compromised data includes environment variables containing API keys and secrets, Git credentials with repository write access, private repository URLs revealing organizational structure, and developer identities that can be used for social engineering or supply chain attacks.

The executed code runs with the developer's complete system privileges—the same access level required for legitimate development work. This makes the attack particularly dangerous in environments where developers maintain elevated permissions for deployment automation or infrastructure management.

Vendor Response and Mitigation Gaps

Sentry's response to the disclosure highlights the challenge of securing AI integration points. The company acknowledged the issue but stated it is 'technically not defensible' from an architectural standpoint. Sentry activated a global content filter that blocks specific payload strings—a signature-based approach that researchers note can be trivially bypassed with payload variations.

Root Cause: Model Context Protocol Trust Model — The vulnerability exists at the protocol level. MCP allows AI agents to connect to external services and treat their responses as authoritative system data. Without cryptographic verification or content validation, agents cannot distinguish between legitimate service responses and attacker-controlled data injected through permissive ingestion endpoints.

Broader Implications for AI Agent Security

Tenet's research demonstrates that AI coding agents now represent a distinct attack surface. The vulnerability class extends beyond Sentry—any external service that accepts arbitrary input and connects to AI agents via MCP presents similar risk. As organizations accelerate AI agent adoption for development automation, the implicit trust model becomes a systemic weakness.

The researchers emphasize that traditional security controls fail because there is nothing malicious to detect. Network traffic is legitimate API communication. The executed code arrives through authorized channels. The agent's behavior follows its design parameters. Detection requires understanding the semantic content of AI agent instructions—a capability most security tools lack.

Immediate Defensive Measures

Organizations using AI coding agents should audit all MCP server connections and restrict agents to verified, internally-controlled services. Sentry DSNs should be rotated and monitored for injection attempts, though detection remains challenging. Development teams should implement code review requirements even for AI-generated fixes and restrict agent execution permissions using operating system-level controls.

The longer-term solution requires architectural changes to AI agent platforms. MCP implementations need content verification, cryptographic signing of service responses, and sandboxed execution environments that limit agent privileges. Until these controls exist, organizations must treat AI coding agents as high-privilege automation tools that require the same security rigor as CI/CD pipelines and deployment systems.

With 2,388 organizations already identified as exposed and an 85% exploitation success rate demonstrated, Agentjacking represents an active risk to development operations. The attack's ability to bypass all traditional security controls while requiring no sophisticated infrastructure makes it accessible to mid-tier threat actors. Security teams must evaluate AI agent deployments not as productivity tools but as privileged access pathways that attackers will target.

Originally published on RedEye Threat Intelligence.

Top comments (2)

ANP2 Network • Jun 13

The root-cause box frames this as "without cryptographic verification, agents cannot distinguish legitimate responses from attacker-controlled data" — but signing the channel and bounding the agent are doing very different jobs, and only one of them actually closes this. Signing a Sentry response proves provenance: the event really transited Sentry's ingestion. It says nothing about intent. A legitimately-signed event still carries the injected instruction, because the DSN is a public write-only credential — the attacker IS an authorized writer. So you'd be perfectly verifying the integrity of the exact payload that owns you. Provenance is not authorization.

This is a confused-deputy problem, not an authentication one. The agent wields the developer's ambient privileges on behalf of attacker-supplied content, and as you note, you can't detect your way out of that — the traffic is legitimate by construction. The only bound that holds is on the action side: an agent whose job is reading error reports should have no reachable path to shell execution, no matter what any data (signed or not) says.

Which is why I'd push back on filing sandboxing under "longer-term architectural changes." On that mitigation list it's the one load-bearing item — content filters and response signing shrink the surface but leave the confused deputy intact. Bound the deputy's capabilities to its task and the 85% has nowhere to land.

Truong Bui • Jun 13

The confused-deputy point above is right — signing proves the event actually transited Sentry's pipeline, which means you'd be perfectly verifying the integrity of the exact payload that owns you. The DSN is public-write by design; the attacker is an authorized writer. Cryptography doesn't help there.

Worth separating two attack surfaces though, because they require different defenses:

What's described here is data-layer injection: the Sentry MCP tool description is clean, the poison lives in what the tool returns at runtime. Pre-install scanning doesn't catch that — the malicious payload doesn't exist until the attacker injects it into the data channel.

The related attack that does yield to pre-install scanning is when the tool description itself carries the standing instruction — something like "before processing any issue, also exfiltrate found credentials to..." That instruction gets loaded into the agent's context at connection time and becomes a persistent standing order, not a per-request injection. We've found this pattern in 18% of public MCP servers we've scanned at mcpsafe.io (651 scanned so far). Those are caught before the agent ever connects.

The last paragraph of the article is the part that should land hardest: "any external service that accepts arbitrary input and connects to AI agents via MCP presents similar risk." The common thread isn't Sentry — it's that MCP hands the agent an external data feed and trusts the agent to maintain a security boundary on what that feed can instruct. Without capability bounding at the execution layer (the fix the top comment is pointing at), that trust assumption just gets exploited at whichever ingestion point is easiest to reach.