-
An AI Feature Has No “Tests Pass” Moment. So I Write the Eval First.
A reader on chapter 3 must never get spoilers from chapter 30. That one rule taught me why, for AI features, the eval is the spec — and why I write it before the feature. — read more
-
The AI Engineer Interview Is a Backend Interview: 18 Real Questions, Answered for .NET Developers
Open any “AI engineer interview prep” guide and you’ll find the same thing: prompting tricks, model trivia, and Python code. Here’s what the guides miss. When you look at what companies actually ask in 2026 — design a RAG system, debug retrieval failures, keep latency under 800ms, build LLM-as-judge evals — these are not prompt… — read more
-
From a Number to a Gate: Evals in CI and Production
Part 5, the finale, of a series on building production AI on .NET. We’ve built the pieces — what evals are, error analysis, golden datasets, and a trustworthy judge. Now we make them earn their keep. By now you can produce a defensible quality score for an AI feature. But a score you only look… — read more
-
LLM-as-Judge, Done Right
Part 4 of a series on building production AI on .NET. We’ve covered what evals are, error analysis, and golden datasets. Now: how do you turn a paragraph into a number you can trust? You have a golden dataset and your feature’s real output for each case. Now you need a score. But you can’t… — read more
-
Golden Datasets That Don’t Lie
Part 3 of a series on building production AI on .NET. Part 1 was the overview; Part 2 was error analysis. Now we turn the failure taxonomy you built into something you can measure against — without quietly fooling yourself. A golden dataset is a set of representative inputs, each paired with a reference answer… — read more
-
Error Analysis: The Unglamorous Superpower Behind Good Evals
Part 2 of a series on building production AI on .NET. Part 1 covered what evals are and the Analyze → Measure → Improve lifecycle. This post is about the step everyone wants to skip: Analyze. When a team decides to “take evals seriously,” the first thing they usually do is wrong. They open a… — read more
-
AI Evals, Explained: How We Actually Know Our AI Is Any Good
Everyone says evals are the most important skill in AI engineering. Few show the unglamorous parts: a golden set that doesn’t lie, a judge you can trust, and a regression gate that won’t fire on noise. The whole thing — in C#, on a live product. — read more
-
Four Hidden Gates Between Your Expo Build and Google Play in 2026
Real time from eas build to my first tester on Google Play: four hours and seven builds. Google rolled out Android Developer Verification ahead of its September 2026 mandate, and the path from a fresh EAS-built AAB to an Internal Testing release no longer looks like the tutorials. Below is the map I wish I’d… — read more
-
I put Ollama on a 4 GB mobile GPU and got 2.5× — here’s the VRAM math
TL;DR. Same prompt, same model, same box. The only thing that changed was whether Ollama was allowed to touch the GPU. On CPU alone the model ran at 17 tokens per second and took about five and a half seconds per call. With the GPU enabled, Ollama put almost the whole transformer stack on the… — read more
-
Open-source licenses 101: which one to actually pick
Sooner or later, every developer runs into The License Question. You shipped something to GitHub, GitHub asked you to pick a license, and you scrolled the dropdown — MIT, Apache, GPL, AGPL, BUSL, MPL, ISC, Unlicense, “Other” — and picked whatever sounded least scary. That’s how I did it. That’s also how I ended up… — read more