The Claude CLI writes errors to stdout, not stderr

#ai #claude #agents #productivity

My engagement bot was dying every morning for three days and the log told me nothing. Fourteen calls to the Claude CLI, fourteen rc=1, fourteen empty strings where the error message should have been. By 08:06 the bot had run a partial cycle, fallen back to generic search queries, built a knowledge-base index of zero characters, and quietly exited without posting a single reply. No traceback. No stderr output. Just rc=1 and silence.

I noticed only because the log-doctor agent watches log file size and posts a ticket when a daily file is suspiciously small. On healthy mornings the bot writes a few thousand lines: discovery, scoring, drafting, posting, the Waiting 14 min heartbeat between cycles. On the three bad mornings the daily log was a hundred lines of claude CLI failed (rc=1): followed by nothing. Once the log-doctor put a ticket on the board I went looking.

I checked the obvious things. claude --version worked. claude auth status was green. which claude returned the right path. The ANTHROPIC_API_KEY env var was set. The bot ran fine when I started it by hand at 15:14 the same afternoon: KB index built with 4865 characters, fact-checking pipeline green, three replies posted. Whatever was killing the morning runs left no fingerprints I knew how to look for.

The answer turned out to be a one-line correction to a subprocess wrapper. The Claude CLI writes its structured error output to stdout, not stderr. My code was logging the wrong stream.

What the Bot Does

I run an engagement bot that wraps the Claude CLI. Three jobs: generate search queries for a handful of topical themes, index what it finds into a small knowledge base, draft replies grounded in that knowledge base. The Claude calls go through agent/claude_cli.py, a thin subprocess.run shim that shells out to:

claude -p "$PROMPT" --output-format json --model sonnet --tools "" --no-session-persistence

It parses the JSON response and returns the result field. That is the entire wrapper.

The wrapper handles failure the way every Python subprocess tutorial handles failure:

try:
    result = subprocess.run(cmd, capture_output=True, text=True, timeout=120, ...)
except subprocess.TimeoutExpired:
    log.warning("claude CLI timed out")
    return ""
except FileNotFoundError:
    log.error("claude CLI not found — is it installed and in PATH?")
    return ""

if result.returncode != 0:
    log.warning("claude CLI failed (rc=%d): %s", result.returncode, result.stderr)
    return ""

Catch FileNotFoundError and TimeoutExpired, check returncode, log stderr on non-zero exit. That convention is so deep in muscle memory I have written it the same way for fifteen years. stdout is for the answer, stderr is for the diagnostics. Any Unix tool that violates that contract is broken.

Except the Claude CLI doesn't violate it so much as renegotiate it. With --output-format json, the entire transport is JSON on stdout. Success and failure both. On a 200 you get {"type": "result", "is_error": false, "result": "...the model's text..."}. On a 429 you get {"type": "result", "is_error": true, "api_error_status": 429, "result": "Rate limit hit"}. The process exits 1, but the diagnostic detail lives on stdout next to where the answer would have been. stderr stays empty. My code never read stdout on failure, so I never saw the 429.

The Log That Told Me Nothing

This is what the morning of 22 June looked like in bot-2026-06-22.log:

08:02:42 agent.claude_cli WARNING claude CLI failed (rc=1):
08:02:42 agent.knowledge_base INFO KB index built: 0 chars
08:03:24 agent.claude_cli WARNING claude CLI failed (rc=1):
08:03:28 agent.claude_cli WARNING claude CLI failed (rc=1):
08:03:33 agent.claude_cli WARNING claude CLI failed (rc=1):
08:03:38 agent.claude_cli WARNING claude CLI failed (rc=1):

Note the empty string after rc=1):. That is my warning format, "claude CLI failed (rc=%d): %s" with result.stderr interpolated. Fourteen failures that morning, fourteen the day before, eleven the day before that. Same pattern starting Saturday 20 June. Every entry identical: rc=1, nothing after the colon, KB index built with 0 chars, every theme's query generator falling back to a generic query, bot dead by 08:06.

The INFO KB index built: 0 chars line is the tell. The bot did not crash. It built an empty knowledge base out of nothing, completed the rest of its loop on empty inputs, and exited without logging its usual Waiting X min heartbeat. Silent degradation. The worst kind.

Wrong Hypotheses

I spent more time than I want to admit chasing the standard subprocess failure modes:

Auth. Reran claude auth status. Token valid, scope correct. The bot's user is the same user that runs interactively. No drift.
PATH. Compared the service-unit PATH with my shell PATH. Identical.
Env vars. ANTHROPIC_API_KEY was set in both contexts. The Claude CLI doesn't actually use that variable when claude auth is set up, but it's a common red herring so I checked anyway.
Timeout. The wrapper has a 120 second timeout. The failures were instantaneous, not slow. TimeoutExpired would have been caught by a separate except branch. It wasn't.
Encoding. PYTHONIOENCODING=utf-8 was already in the subprocess env. Not it.

None of these worked because the failure was not local. It was a 429 from the Claude API itself, returned through the CLI. My wrapper had no way to see it, because the wrapper read the wrong file descriptor on failure.

The One-Line Fix That Became a Fifteen-Test Fix

The actual change in agent/claude_cli.py is small. On returncode != 0, parse result.stdout as JSON, pull api_error_status and result out, log those:

if result.returncode != 0:
    error_msg = ""
    api_error_status = None
    try:
        err_data = json.loads(result.stdout)
        error_msg = err_data.get("result", "")
        api_error_status = err_data.get("api_error_status")
    except (json.JSONDecodeError, AttributeError):
        error_msg = result.stdout[:200] or result.stderr[:200]

    log.warning(
        "claude CLI failed (rc=%d, status=%s): %s",
        result.returncode, api_error_status, error_msg[:200],
    )

Once I could see the actual error code I could decide what to do about it. Rate limits and 5xx are transient, so the wrapper now retries once after a 30 second sleep on {429, 500, 502, 503} and fails immediately on {401, 403, 404}:

_RETRYABLE_STATUSES = frozenset({429, 500, 502, 503})
_RETRY_DELAY_SECONDS = 30

# inside the loop:
if attempt == 0 and api_error_status in _RETRYABLE_STATUSES:
    log.info("Retrying after %ds (status=%s)...", _RETRY_DELAY_SECONDS, api_error_status)
    time.sleep(_RETRY_DELAY_SECONDS)
    continue
return ""

The behavior change is small. The test surface around it is not. I added fifteen tests to tests/test_claude_cli.py, and reading the names back is the whole story:

test_rc1_logs_stdout_error_not_stderr
test_rc1_no_retry_for_401
test_rc1_no_retry_for_403
test_rc1_retries_once_on_429
test_rc1_retries_once_on_500
test_rc1_retry_sleeps_30s
test_rc1_no_second_retry_on_repeated_429

Every name there is a contract I was previously assuming and not enforcing. The first one is the one I most wish I had written on day one.

The Second Bug the First One Was Hiding

While I was in main.py I also found why the bot was exiting cleanly with no error: an uncaught exception from _find_one_candidate() was propagating past except KeyboardInterrupt and ending the process. The candidate search uses Playwright, and Playwright occasionally raises during page.close() on a slow page. That exception had nowhere to land. The fix is the one you have written a hundred times:

try:
    candidate = self._find_one_candidate(...)
except Exception:
    log.error("candidate search crashed", exc_info=True)
    time.sleep(60)
    continue

Without that wrapper, a Playwright crash during a context teardown looks identical to a clean exit. The exception unwinds through the bot's outer while True: and lands at the top-level except KeyboardInterrupt as a normal program end. Combined with the silent-stderr issue, two failures stacked on top of each other to produce one symptom: bot gone, no error, no trace. The retry fix would have made the rate-limit pain go away on its own. Only the try/except Exception made the bot survive the crash that the rate-limit pain was masking. Two unrelated bugs hiding inside one mystery is, in my experience, the median case for production debugging, not the worst case.

What I Would Do Differently

Test the error output contract of every CLI you wrap, on the first day you wrap it. Not the success path. The success path is trivial. The failure path is where the surprises live. The wrapper is exactly the layer that should encode every assumption you make about the tool's failure behavior into a unit test.

For the Claude CLI specifically, the design decision to put structured errors on stdout is defensible. stderr is for human-readable warnings, stdout is for machine-parseable output, and a non-zero exit code with structured JSON on stdout is internally consistent with the success contract. It is also exactly the contract I would have caught in five minutes of running claude -p "test" --output-format json against a deliberately broken prompt before I shipped the wrapper. I did not, because I assumed the contract.

The general lesson, with the wrong-hypotheses list above as the evidence: when the standard debugging checklist returns all green, the bug is probably in something the checklist takes for granted. In my case, the checklist took for granted that errors arrive on stderr. They did not. They arrived on stdout the whole time, in JSON, with the HTTP status code clearly labeled, waiting for someone to read them.

The cost of the assumption was not catastrophic. The bot served one degraded morning cycle per day and recovered after a manual restart in the afternoon. But the cost of finding it was three days of guessing. The log line my wrapper printed on failure was actively misleading: it claimed there was no error output at all. A noisy rc=1 with empty stderr would have been a useful breadcrumb. A silent rc=1 was a dead end. Logging the wrong stream is, in retrospect, worse than logging nothing, because it tells the future-me reading the log that there is nothing there to find.

Anyone else been bitten by a CLI that puts structured errors on stdout? I'd be curious whether this is more common than the Unix-convention crowd lets on.

Repos

The subprocess wrapper and the rest of this bot live in github.com/Ekioo/KittyClaw under the AccountBuildUp project. MIT, star if useful. I also shell out to the Claude CLI from the content pipeline behind ekioo.com, my consulting site, where the same gotcha was waiting and is now patched the same way.