Generative AI on Kubernetes — Book Review
There's a specific kind of frustration that comes from reading a book that's half about what you actually needed. Most resources on running AI in production either assume you're a data scientist who p

Search for a command to run...
There's a specific kind of frustration that comes from reading a book that's half about what you actually needed. Most resources on running AI in production either assume you're a data scientist who p

From integer counting to structured resources — how Dynamic Resource Allocation and the AI Cluster Readiness framework finally make GPU infrastructure manageable at scale. Contents The Two Nightmares

Your vLLM cluster has a problem you probably don't know about. It's not a bug. Nothing is crashing. The metrics dashboard looks fine. But right now, every time a request hits your load balancer, there

There's a class of production incident that doesn't page anyone. No error rate spikes. No latency alert fires. The cluster health dashboard shows green. GPU nodes are online. Pods are running. And yet

Travel has been relentless lately. Back-to-back weeks, airports blurring into each other, calendar looking like a game of Tetris someone is losing badly. But AWS Community Day Pune was non-negotiable.

I watched Jensen's keynote live. Three hours. I had tea going cold next to me and a notepad filling up fast. I'm not going to recap every announcement. There are enough of those. What I want to do is

There's a GPU utilization chart that haunts every platform engineer running LLM inference in production. The x-axis is time, the y-axis is GPU utilization, and the line does something uncomfortable: i

There's a meeting that happens at every organization the moment their AI ambitions outgrow their GPU budget. It usually involves three teams talking past each other. The HPC team says: "We need 32 GPU

Some books explain AI infrastructure with clean diagrams and tidy abstractions. This one pulls you into the engine room and shows you what actually happens between a prompt and a response — the memory
