<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Vasu Dalal</title>
    <description>The latest articles on DEV Community by Vasu Dalal (@vdalal).</description>
    <link>https://dev.clauneck.workers.dev/vdalal</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4001076%2Fc1ecd5b8-65f7-4be2-a41c-8811bdc5a715.png</url>
      <title>DEV Community: Vasu Dalal</title>
      <link>https://dev.clauneck.workers.dev/vdalal</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.clauneck.workers.dev/feed/vdalal"/>
    <language>en</language>
    <item>
      <title>I gave my AI agent database access. Then I built a firewall so it couldn't wipe prod.</title>
      <dc:creator>Vasu Dalal</dc:creator>
      <pubDate>Wed, 24 Jun 2026 18:46:26 +0000</pubDate>
      <link>https://dev.clauneck.workers.dev/vdalal/i-gave-my-ai-agent-database-access-then-i-built-a-firewall-so-it-couldnt-wipe-prod-83c</link>
      <guid>https://dev.clauneck.workers.dev/vdalal/i-gave-my-ai-agent-database-access-then-i-built-a-firewall-so-it-couldnt-wipe-prod-83c</guid>
      <description>&lt;p&gt;A few months ago I gave an autonomous agent write access to a real database. It was a LangChain-style loop — plan, call a tool, observe, repeat and one of the tools ran SQL.&lt;/p&gt;

&lt;p&gt;It worked great in the demo. Then I watched it, during a "clean up the test rows" task, generate this:&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
sql
DROP TABLE users;

It didn't run (staging, and I was watching). But the lesson landed: the LLM doesn't know the difference between a destructive command and a safe one until it's already calling the tool. And by then your code is one cursor.execute() away from an incident.

**"AI firewalls" guard the wrong side**

When I went looking for protection, almost everything in the "LLM security" space guards the inbound side — prompt injection, jailbreaks, PII in the input. Useful, but it's the wrong end for an autonomous agent. My problem wasn't a malicious prompt. It was a well-meaning agent emitting a catastrophic action.

What I actually wanted was a firewall on the outbound side; the tool calls themselves:

- destructive SQL (DROP TABLE, unscoped DELETE)
- writes to prod / ALTER ... DROP COLUMN
- SSRF and cloud-metadata fetches (169.254.169.254)
- bulk secret / API-key reads
- runaway retry loops draining your token budget

And critically: I wanted the catch to be deterministic. If your safety layer is itself an LLM call, it's slower, costs money, and can be talked out of it. A DROP TABLE should be blocked by a rule, not a vibe.

**The 2-minute version you can run right now**

I ended up building this and putting the SDK on PyPI. Here's the whole thing; it blocks a live DROP TABLE offline, with no API key, using built-in policy seeds:

pip install agentx-security-sdk

from agentx_sdk import agentx_protect, is_block

@agentx_protect(agent_id="demo")
def run_sql(query: str, db_session=None):
    print("EXECUTED (DANGER):", query)   # never reached
    return {"ok": True}

result = run_sql(query="Please clean up: DROP TABLE users
print("BLOCKED:", is_block(result))       # -&amp;gt; True, offline, no key

One decorator on your tool function. The destructive call gets intercepted before your
function body runs, and you get a block result back insteateway,
no account, no LLM in the hot path as it runs entirely on your machine.

▎ Note: the package is agentx-security-sdk (import path agentx_sdk), version ≥ 0.3.11.

**How the block works**

The decorator wraps your tool call and runs the arguments through a layer of deterministic
checks before execution including pattern + structural rules for s
(destructive SQL, prod writes, SSRF targets, secret-store reads, no-progress loops). If a rule trips, the call returns a block instead of executing. No  the floor, which is why it works with no key and adds negligible latency.

There's more above that floor — it can escalate ambiguous-but-dangerous actions for a human-in-the-loop decision, circuit-break a runaway loop, reframe and retry the run instead of just dying. But the part I want you to be able to verify in 2 minutes without trusting me is the whole point of leading with it.

**Why I'm posting this**
I'm looking for a handful of people running real Python agents; something that touches a
live DB, cloud, files, or money, ideally unattended to stack and
tell me where it's wrong. Not a launch, not a sales pitch. I want to know:

- Does it catch the thing that would've bitten you?
- What dangerous action shape is it missing?

If you've ever thought "what happens when this agent does something irreversible at 2am," I'd genuinely like your take.

- Try it live (keyless quickstart): https://bit.ly/agentfirewall
- Community / tell me what broke: https://discord.gg/PmWR
- Or just reply here. Bonus points for the war story that made you click.

If your agent never touches anything irreversible, ignore me. If it does, the repro's two minutes, and DROP TABLE is a bad way to find out  the hard way.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>security</category>
      <category>llm</category>
    </item>
  </channel>
</rss>
