Spend the
error budget wisely
Infrastructure notes on running systems that can fail but must recover. Production patterns for AI workloads, VMware deployments, and audit-ready operations — from an architect operating regulated fintech systems for 10+ years.
Free · Operators only · No spam
The premise
Every production system has an error budget — the amount of failure it can absorb before SLAs break. The job is not zero downtime. It is spending that budget deliberately: on planned maintenance, controlled rollouts, and calculated risk — not on surprises at 3am.
What I write about
Latest note
View all →Five backup architecture patterns for fintech: a decision framework from production
Choosing backup architecture for regulated financial infrastructure is harder than vendor pitches suggest. Five patterns we have operated, their trade-offs, and the decision framework that helps match pattern to workload.
More notes
What auditors asked when we deployed AI: questions, answers, and what we learned
Real audit questions when AI infrastructure entered our regulated environment. PCI DSS, ISO 27001, and regulatory inspection patterns. The answers that passed, the answers that didn't, and how to prepare evidence that scales.
The AI memory crunch: how DRAM and NAND price shocks reshape infrastructure budgets
DDR5 prices up 3-4x. Enterprise SSDs up 470%. Memory manufacturers redirecting capacity to AI customers. Notes from infrastructure operators navigating the worst memory market in a decade, and the procurement strategies that work.
Bandwidth contention at peak: backup vs traffic vs telemetry
At peak, four streams fight for one network: live user traffic, near-realtime backup replication, log shipping, and metrics. Here's a quantified worked example of the saturation, why load tests miss it, and a tiered must-have / should-have / nice-to-have fix list.
Security-first infrastructure for payments: isolation, key management, and PCI scope reduction
How payment infrastructure is architected security-first: PCI scope reduction, HSM-backed key management, tokenization, and the segmentation that keeps the highest-risk data in the smallest possible blast radius.
vSAN for mixed workloads: policy design, AI patterns, and the OSA-to-ESA transition
Operating vSAN clusters that host both regulated banking workloads and AI training. Storage policy design for mixed workload classes, OSA and ESA architecture trade-offs, and lessons from running both in production.
the error budget
One deep technical note every Friday. Production patterns, audit-ready configurations, and lessons from operating mission-critical infrastructure. Written by an architect, for architects.
Free. Unsubscribe anytime. No spam, ever.
10+
Years fintech infrastructure
10
VMware clusters in production
Anonymous
No vendor influence