Marcos Henrique for AWS Community Builders

Posted on Jun 24

Every Alarm Is a Crime Scene: Meet Poirot, the Read-Only Incident Detective

#aws #ai #typescript #cdk

What if the thing that gets paged at 2am wasn't a person, but a detective that investigates the incident strictly read-only and leaves a grounded report in your inbox? Built on top of Danielle Heberling's headless-Claude-on-shared-compute pattern, pointed at incident response, written in TypeScript.

The 2am page

The alarm fires on an error spike. You wake up, open a blank terminal, and start doing the same thing you always do: pull the error signatures, find the last deploy, size the blast radius, form a guess. The first thirty minutes of every investigation are the same ritual, and you do them half-asleep.

That ritual is read-heavy, bounded, and repetitive. Which is just another way of saying it's exactly the kind of work an agent should be doing instead of you.

So I built one. Its name is Poirot.

What Poirot is

In one breath: Poirot is Claude Code running headless on shared AWS compute. A CloudWatch alarm fires on a log-error spike, Poirot gets dispatched automatically, investigates the incident under a read-only lock, and writes a grounded root-cause report to your inbox. It diagnoses. It never touches production. No human in the loop until the report lands.

Credit where it's due, and it's due up front: this is built squarely on Danielle Heberling's pattern, the headless coding agent on shared cloud compute that a whole team can trigger, instead of Claude Code living on one person's laptop. Her version, headless-claude-on-aws, investigates failed CloudFormation deploys. I borrowed the shape, pointed it at general incident response, and rewrote it in TypeScript with CDK. Same DNA, different case file.

The case, in one picture

CloudWatch alarm  (error-spike metric filter)
       │  alarm action
       ▼
     SNS · AlarmTopic
       │
       ▼
 Trigger Lambda ──StartBuild──▶  CodeBuild · "poirot-investigator"
                                       │
                                       │  installs Claude Code, runs the TS runner
                                       ▼
                              claude -p  (headless, stream-json)
                                       │  Bash → AWS CLI
                                       ▼  ── under the READ-ONLY investigator role ──
                       CloudWatch Logs Insights · metrics · deploy history
                                       │
                                       ▼
                           Root-cause report ──▶ build log + SNS · ReportsTopic ──▶ 📧

An alarm becomes an investigation. An investigation becomes a report. That's the whole plot.

Three choices that do the heavy lifting

1. Poirot literally cannot wreck your production

Poirot runs under a dedicated read-only IAM role, separate from the role that launches it. Even if the model is confused, plain wrong, or prompt-injected by a malicious line in a log it's reading, it cannot create, modify, restart, scale, or delete anything. The credentials don't allow it. Safety here isn't the model promising to behave. It's IAM refusing to let it misbehave.

But read-only on its own isn't the whole story, and this is the part I actually want you to take home. Poirot reads untrusted log content, and a log line is an attacker-controlled string. Then it publishes a report somewhere a human will read. That's a prompt-injection surface with an exfiltration path bolted onto it: a crafted log line could try to talk the agent into reading a secret and writing it into the report.

So the investigator role is ReadOnlyAccess minus an explicit deny on the high-value reads: secrets, SSM parameters, KMS decrypt, S3 object contents, DynamoDB data.

build role        →  can ONLY: assume the investigator role, publish a report,
                      read the Claude token            (least privilege)
investigator role →  ReadOnlyAccess, what Claude's AWS CLI calls actually run as
                      ...minus an explicit DENY on secrets, SSM params, KMS
                      decrypt, S3 object reads, and DynamoDB data

Poirot can investigate your infrastructure (logs, metrics, deploys, config) but it cannot read your data, no matter how nicely a log line asks. The mental model is the whole point: you don't trust the agent, you trust the blast-radius wall you built around it. Prompts are not a security boundary. IAM is.

2. It bills against a subscription, not per token

Claude Code authenticates with a Claude Pro/Max subscription token, not a per-token API key. Investigations draw down your plan at a flat cost. The night a flapping dependency fires fifty alarms, you get fifty investigations and the same bill at the end of the month. If you'd rather meter per token, swapping in an API key is a one-line secret change.

3. Incident response is a near-perfect agent task

Read-heavy, bounded, repetitive: the exact profile that burns out on-call humans, and the exact profile where an agent earns its keep. Pull the error signatures, correlate with the last deploy, size the blast radius, write it up. Poirot does the first thirty minutes so a human starts from a hypothesis instead of a blank terminal. It isn't replacing the engineer. It's deleting the boring part of the engineer's night.

What the detective hands you

Poirot always closes the case with one self-contained report, its structure pinned by the system prompt:

## Incident summary
checkout-api 5xx rate jumped from ~0.1% to 18% at 14:02 UTC and is ongoing.

## Root cause
Deploy `d-AB12CD` (14:01 UTC) shipped a config change that points the service at
a connection pool of 5; under normal traffic it exhausts immediately, surfacing
as "FATAL: remaining connection slots are reserved".

## Evidence
- Logs Insights: 9,412 × "remaining connection slots are reserved", first seen 14:02:11, zero before 14:02.
- CodeDeploy: deployment d-AB12CD completed 14:01:48, one minute before onset.
- CloudWatch: DatabaseConnections flatlined at the new ceiling from 14:02.

## Blast radius
All checkout traffic in us-east-1; ~18% of requests failing. Read paths unaffected.

## Confidence
high (deploy timestamp, new error signature, and connection ceiling all line up).

## Recommended next steps
1. Roll back d-AB12CD or raise the pool size.
2. Add a pre-deploy check on the pool-size config.

The format matters less than the rule behind it: every claim is tied to something Poirot actually retrieved, a log line, a metric, a deploy event. No vibes. If it isn't in the evidence, it doesn't go in the report.

And like any good detective, it's allowed to say "I'm not sure." The method is fixed: establish the facts, read the actual error lines, correlate with recent deploys, size the blast radius, then form a hypothesis and try to disprove it before committing. A ranked shortlist of honest maybes beats one confident wrong answer, a principle I lifted straight from Danielle's version.

The deliberately boring choices

A few decisions that aren't "best practice," on purpose:

AWS CLI over an MCP server. Claude Code's Bash tool running the plain AWS CLI. Fewer moving parts, and every read maps cleanly to an IAM action I can scope or deny. MCP is a fine upgrade path later; it wasn't worth the wiring now.

CodeBuild over Lambda. Investigations are long, bursty, and want a real shell with the CLI and the Node toolchain. CodeBuild gives that with no idle cost. Danielle landed on CodeBuild too, for the same boring-is-good reasons.

Rotate-when-it-expires over zero-touch tokens. The subscription token lasts months. A fully automatic refresh-back-to-Secrets-Manager variant exists, but it needs write access on the build role and serialized builds to dodge refresh-token races. Not worth the extra blast radius for a credential I touch twice a year.

This is the ethos Danielle put better than I will: we've gotten a little precious about reference architectures. The clean, fully-managed, perfectly-scoped version often doesn't exist yet, or isn't mature, or would take three times as long to ship. Meanwhile the slightly messy version you actually understand and can keep running yourself is sitting right there, solving the real problem today. Poirot is that: boring on purpose, mine to maintain, working now.

Code is cheap, show me the screenshots

And after a few minutes it succeeded:

After all, I got an email about the findings (yeah it is not fancy...yet)

The little grey cells, on shared compute

Here's what I keep coming back to. The hard part of on-call was never producing the commands. It's holding a model of a live system in your head and interrogating it. That's the scarce skill, and it's the one AI makes more valuable, not less, because the cheap part, typing the CLI calls, is exactly what you can now hand off. Poirot doesn't close the case. It does the legwork, lays the evidence on the table, names a suspect with a confidence level, and hands you a starting point instead of a blank terminal. You still decide.

You don't trust the agent. You trust the wall around it, the evidence under it, and your own judgment on top. That's the whole trick.

Repo's here if you want to fork it: github.com/wakeupmh/poirot-agent. And go read Danielle's post, because Poirot wouldn't exist without it.

Mon ami, may your pages be few and your reports be grounded. 🕵️

Top comments (1)

Mike Czerwinski • Jun 24

The "you don't trust the agent, you trust the wall around it, the evidence under it, and your own judgment on top" line is the production formulation of an axis I keep watching peers converge on under different vocabularies. Adam's Conduit gateway (this week, plugin discipline post) lands on the same three layers from the API edge. Shudipto Trafder's hallucination measurement framework (Medium, April) calls it claim-level faithfulness. A thread under Alexander Tyutin's agent-escalation post earlier this week landed on "the cage cannot live inside the thing being caged." Different domains, identical architectural commitment. Reading your version through that frame is what made me want to comment.

The "no vibes, every claim tied to something Poirot actually retrieved" rule is the part that does the most work for me. That is a story gate built into the report format, not as a model promise but as a structural constraint on what counts as a claim. Shipping that discipline at the report layer is the move most "AI summary" tools quietly skip.

Honest stage marker on this side: I work adjacent (verification engineering and decision audit on dev.to, no production incident-response stack), so this is the perspective from the floor above. One concrete extension worth pushing on, less for Poirot specifically and more for the shape: a planted-fault test on the report itself, not just the IAM wall. Feed Poirot a synthetic incident where the available evidence supports a plausible-but-wrong cause, watch whether the "honest maybes" hedge fires or whether the report commits to the wrong cause at high confidence. Read-only protects production. It does not protect the on-call human from being misled by a well-formatted wrong report. The next floor is whether the confidence label in the report is itself audited under planted faults the same way the IAM wall is audited by, well, IAM.

Credit on the line worth stealing forward: "a ranked shortlist of honest maybes beats one confident wrong answer." And credit upstream Danielle for the headless pattern that makes any of this shippable.