Uber Burned Through Its Entire AI Coding Budget in 4 Months. Here's What Smart Teams Do Instead.

#ai #programming #devops #claude

The AI coding bill just became everyone's problem. In the last two weeks alone:

Uber blew through its entire 2026 Claude Code budget by April and capped employees at $1,500/month
Gartner reported that 23% of tech leaders now spend $200-500 per developer per month on AI coding tokens alone
GitHub flipped Copilot to usage-based billing, turning a predictable $19/seat into an open-ended credit drain
Ramp's AI Index shows the top 1% of firms spending $7,500/employee/month on AI — $90K/year per head, up 14.1% in a single month

The pattern is clear: agentic workflows burn tokens faster than any flat budget anticipated. And single-vendor lock-in makes it worse — when your only option is Opus 4.8 at $75/M output tokens, every wasted thinking loop is expensive.

The Real Problem: Not All Tasks Need the Best Model

Here's what I learned after watching my own AI coding spend hit $10K/month earlier this year.

I was sending everything to Claude Opus. Code planning? Opus. Writing unit tests? Opus. Formatting a config file? Opus. Renaming a variable across three files? Opus.

That's like hiring a senior architect to move furniture. The work gets done, but you're massively overpaying.

When I actually profiled my usage, the breakdown looked like this:

~15% of tasks genuinely needed frontier reasoning (complex architecture decisions, subtle bug diagnosis, multi-file refactors with tricky dependencies)
~25% of tasks needed solid mid-tier capability (implementing features from clear specs, writing meaningful tests, code review)
~60% of tasks were mechanical (formatting, renaming, boilerplate generation, simple file operations, documentation updates)

That 60% was burning frontier-tier tokens for work that Haiku, Gemini Flash, or even a local model could handle identically.

Task-Level Routing: The Boring Fix That Saves 60-70%

The concept is simple: instead of routing every request to one model, classify each task and send it to the cheapest model that can handle it well.

Planning phase → Frontier model (Opus, GPT-5). This is where reasoning depth matters. You want the model that catches edge cases your spec missed.

Implementation → Mid-tier model (Sonnet, GPT-4.1). Given a clear plan, most code generation doesn't need maximum intelligence — it needs reliable instruction-following.

Tests, formatting, docs → Fast/cheap model (Haiku, Flash, Gemini 2.5). These tasks have objectively verifiable outputs. Either the test passes or it doesn't. You don't need 200 IQ for assertEqual.

Debug/diagnosis → Frontier model again. When something breaks in a non-obvious way, you want the best reasoning available.

After implementing this approach, my monthly spend dropped from ~$10K to ~$3K. Same output quality. Same velocity. Just stopped overpaying for routine work.

How to Actually Do This

You don't need custom infrastructure. Here's the practical version:

1. Audit Your Token Usage

Before optimizing, know where your tokens go. Log the actual prompts hitting the API for a week. You'll probably find:

Context bloat (frameworks serializing full state into every call)
Unnecessary thinking loops (model "reasoning" about trivial operations)
Repeated system prompts eating 10K+ tokens per call

2. Create Task Categories

Start simple — three tiers is enough:

Tier 1 (Frontier): Architecture, complex debugging, security-sensitive code
Tier 2 (Mid): Feature implementation, test writing, code review
Tier 3 (Fast): Formatting, documentation, boilerplate, simple edits

3. Route Based on the Task, Not the Session

The key insight: routing should happen at the task level, not the session level. A single coding session might need Opus for the initial design, Sonnet for implementation, and Haiku for writing tests — all within the same workflow.

Most teams I've talked to start with manual routing (just switching models themselves) and then automate it once they see the pattern.

4. Monitor and Adjust

Track cost-per-task, not just total spend. When you see a Tier 3 task consuming $2 worth of tokens on a frontier model, that's a routing failure. When a Tier 1 task fails on a cheap model, that's also a routing failure. The sweet spot is in the middle.

The Bigger Picture

Ramp's data tells an interesting story: the companies spending the most on AI aren't the ones in trouble. The ones in trouble are companies locked into a single vendor with no ability to route.

"The top 1% of firms tend to mix and match, bouncing between multiple frontier models and platforms that give them access to cheaper models." — Ramp AI Index

This isn't about spending less on AI. It's about spending smarter. The teams that figure out task-level routing now will have a structural cost advantage as agentic workflows become the default.

The $10K/month developer AI bill is already here. The question is whether you're paying it because you need to, or because you never bothered to check which tasks actually require the expensive model.

I've been building apps with AI coding tools for the past year and tracking the economics obsessively. Happy to share specific numbers or discuss routing strategies in the comments.