DEV Community: speed engineer

How Japan Just Beat Claude Mythos

speed engineer — Fri, 26 Jun 2026 04:56:39 +0000

A founder I know forwarded me the Sakana launch tweet with one line on top: "should we switch?"
Look, the tweet worked on him. Fugu Ultra stands shoulder-to-shoulder with Fable 5 and Mythos preview, no export controls attached, and that was all it took. My honest first reaction was a small stomach drop, because if it were true it'd reset half my stack.
Then I read the actual post. Fugu isn't a model. It's an orchestration layer wearing a model's coat.
And the part that should bother you isn't the marketing. It's that a lot of senior people saw one benchmark beat Fable and stopped reading right there.
I've had three of these models in production this year, so I read launch posts like this one with a particular kind of suspicion. This one earned it.

Why this matters before you forward it to your CTO

The export-controls line isn't a benchmark claim. It's a procurement claim. Fable and Mythos access got suspended under a directive, so a vendor shows up saying you can have that frontier output anyway, through them, no paperwork.
And honestly, if legal just killed your Mythos access mid-project, that hurts. A wrapper that routes around it sounds like oxygen.
But you're not buying a model. You're buying a middleman who rents the same models you could rent yourself, then charges you to pick between them. Somebody should say that out loud before the contract gets signed.

The number everyone misread

The standard instinct is that a higher LiveCodeBench score means a smarter system, and usually that instinct is fair. Most of us live and die by SWE-bench and terminal-bench numbers when we pick a coding model.
That breaks the moment the thing on the leaderboard isn't a model.
Fugu Ultra scored 93.2 on LiveCodeBench. Fable scored 89.3. But Fable is inside Fugu Ultra. So is GPT. We don't actually know the full roster underneath, and that right there is part of the problem.
You didn't watch Japan beat Anthropic. You watched Anthropic's model, plus a few other models, plus a router, beat Anthropic's model running alone.
A collection of models beating one of its own members isn't a discovery. We've known that since boosting. Four of you post a faster total than Usain Bolt, and somehow that's "we beat Bolt."

What they actually built is a router, not a model

The framing is junk. The engineering underneath is actually interesting.
OpenRouter already shipped this shape with their Fusion idea, where one prompt fans out to several models at once and a judge model reads all the answers and stitches them into a single reply. Same path on every request, nothing learned about when to do what.
Sakana's twist is that the orchestrator is itself a small trained LLM. It doesn't fire every model on every request. It's trained to decide which models to call, when to delegate, and how to stitch the result. They ship two tiers. Fugu handles low-latency work; Fugu Ultra pulls in a deeper pool of agents for the hard multi-step stuff.
So the thing you call is a learned dispatcher, right? The raw intelligence still comes from the frontier models underneath: Opus, GPT, Fable, take your pick. Sakana owns the routing and the synthesis, and that's the whole product.
There's a name for this. It's the mixture-of-agents pattern, productized behind one endpoint. Useful, but nowhere near frontier.
Sakana FuguThat's why the leaderboard comparison doesn't survive a second look. The number proves the bundle is good. It says nothing about whether Sakana built the good part, and for a buying decision that's the whole game.
It's closer to an AI harness than a model, honestly. Claude Code is a harness too, and nobody puts Claude Code on a model leaderboard.

Same call, four invoices

The API is the seductive part. You hit one endpoint, you get one answer.

Looks like any other model call. That's the whole pitch. resp = client.messages.create( model="fugu-ultra", # not a model, a dispatcher messages=[{"role": "user", "content": prompt}], ) Under the hood this may have called Opus + GPT + Fable, run a judge pass, and billed you for every token of all of them. You can't see which. The selection logic is closed source.

That last comment is the real problem. You don't get to see the routing. You can't inspect which models ran, why, or how the final answer got assembled. It's a black box sitting on top of other black boxes.
Now the bill, in dollars. Somebody on Hacker News put it better than any analyst:
You already pay $200 each to Anthropic, OpenAI, Cursor, Google. It doesn't round up nicely, so you end up paying another $200 a month to Sakana just to coordinate it.
Call it another $200. The exact figure doesn't matter, the shape does. You're already paying every provider underneath, and now you're paying a margin on top to have something choose between them.
Do the math on a single hard request and it's worse than one tax.

`One hard request, Fugu Ultra fans out to ~3 models + a synthesis pass.

Solo call (what you do today):
~20K input + ~4K output, billed once, to one provider.

Fugu Ultra, same request:
~20K input x 3 models -> ~60K input across 3 providers
~4K output x 3 models -> ~12K output
synthesis pass reads it all -> ~32K input + ~4K output

~3x to 4x the tokens, in and out, for ONE answer,
then Sakana's margin on top.

At frontier prices, that's not a rounding error.
`

For a one-off hard problem where being right beats the bill, sure, maybe it's worth it. For your day-to-day coding loop, you're lighting money on fire to get an answer your existing frontier model would've handed you anyway.

The aftermath

To be fair, the team isn't a bunch of clowns. David Ha, co-founder and CEO, made managing director at Goldman running rates trading in Japan before he left for Google Brain, where he co-authored the World Models paper that a lot of us actually read. The beta ran with close to 500 early users building real things, and the demos aren't faked: small UIs, chess, 3D-cube solving, some ML work.
But the reaction split for a reason. The same crowd that respects the founder is also side-eyeing a lab that calls itself frontier while mostly selling B2B AI apps to Japanese businesses, with recruiting people describe as abrasive. He's clearly driven. But this thing just doesn't feel thought through.

There's no moat here

Plenty could still go wrong here, and I don't do happy endings.
First, defensibility. If the routing is a genuinely novel trick that reliably squeezes more out of the same models, then every frontier lab ships their own version inside a week. Anthropic and OpenAI already hold all the pieces. Why would they hand the coordination margin to a third party sitting on top of their own models? They wouldn't. They'd absorb it.
Then there's variance. A chunk of that "feels smarter" could be retry behavior and lucky sampling, not real lift. Run it across many benchmarks instead of one cherry-picked bench and the gap might shrink to noise. They showed one bench, which is the tell.
And the black box. When Fugu hands you a wrong answer, you can't tell which underlying model failed or why the router picked it. Good luck debugging that during an incident. You've outsourced the exact layer you need to see into, and that opacity bites you in production every single time.
So no, Japan didn't beat Mythos. A clever dispatcher rented Mythos by the token and stacked a few friends on top. If you're already on a frontier model, don't jump ship. If you're coming from something three or four months old, you'll feel a lift, but that lift is the frontier models underneath, and you can rent those directly without the extra invoice.
They're selling it as the last API key you'll ever need. What it actually is: four bills to coordinate them all.

Enjoyed the read? Let's stay connected!
🚀 Follow The Speed Engineer for more Rust, Go and high-performance engineering stories.
💡 Like this article? Follow for daily speed-engineering benchmarks and tactics.
⚡ Stay ahead in Rust and Go - follow for a fresh article every morning & night.

Your support means the world and helps me create more content you'll love. ❤️

Your users already built your product — look at their ugly workarounds

speed engineer — Fri, 26 Jun 2026 03:48:15 +0000

When you build software, users will happily tell you what they want. "Make it faster." "Add a dashboard." "Make it more like [competitor]." Most of it is noise. People are great at describing pain and terrible at prescribing solutions — and the solution they ask for is rarely the one that fixes the pain.

The real signal isn't in what users request. It's in what they've already cobbled together to survive without you.

Two products, one habit

I've built two SaaS products, and both came from staring at a workaround instead of a feature request.

FillTheTimesheet. Freelancers and agencies don't say "I need time-tracking software." They say tracking is annoying and they'll deal with it later. So I watched what "later" actually looked like: the last day of the month, a blank spreadsheet, and someone reconstructing three weeks of work from calendar invites, Slack scrollback, and memory. That monthly guessing ritual was the spec. It said: capture has to happen in the moment, with near-zero friction, or it won't happen at all. Nobody requested that. The workaround screamed it.

PromptShip. Teams using ChatGPT, Claude, or Gemini don't ask for a "prompt library." But watch them work and the same artifact shows up everywhere: a chaotic Google Doc, a pinned chat message, a personal notes file full of prompts that worked once. Someone writes a genuinely good prompt, shares it, and a month later everyone has lost it and rebuilt it from scratch. That messy doc is a feature request written in frustration. It said: these prompts are reusable assets the team keeps re-earning, so they need one shared home and one-click reuse — not better documentation discipline.

Why workarounds beat feedback

A workaround is a revealed preference. It already passed the only test that matters: someone was annoyed enough to do extra work to get around the gap. That's a person voting with effort, not opinion.

Feature requests are the opposite. They're cheap to say and easy to abandon. "I'd use that" predicts nothing. "I built a fragile spreadsheet to do that every Friday" predicts a real need.

So when I sit with a user now, I've stopped asking "what do you want?" I ask:

What do you do right before and after using the tool? (the manual glue)
Where's the spreadsheet, doc, or notes file you've built on the side?
What do you redo every week that you wish you didn't?

The answers map almost directly onto a roadmap.

The trap

The catch is that workarounds are a little embarrassing, so people hide them. Nobody volunteers "I reconstruct my hours from memory" or "our prompts live in a doc called final_FINAL_v3." You have to make it safe to admit the hacky thing. Once you do, they'll hand you the spec for free.

Takeaways

Treat feature requests as symptoms, not specs.
Hunt for the workaround: the spreadsheet, the doc, the pinned message, the manual ritual.
A workaround is effort someone already spent — the strongest signal you'll get.
Build the thing that makes the workaround unnecessary, then get out of the way.

The two products I'm proudest of weren't ideas. They were someone's ugly workaround, made a little less ugly.

FillTheTimesheet and PromptShip both started as someone's workaround. If you're running an ugly one right now, I'd genuinely love to hear it.

Your team's best AI prompts are dying in chat threads

speed engineer — Thu, 25 Jun 2026 03:45:37 +0000

The $0 asset your team keeps throwing away

Someone on your team just wrote a ChatGPT prompt that turns a messy bug report into a clean changelog entry. It works perfectly. They paste it in Slack. Three people react with a fire emoji.

Two weeks later, nobody can find it.

This happens on every team using AI right now. The prompts that actually work — the ones refined through ten frustrating iterations — live in DMs, sticky notes, and someone's chat history. They're treated as throwaway text instead of what they really are: reusable operational knowledge.

Why prompts are worth saving

A good prompt is basically a small program written in plain English. It encodes:

The exact context the model needs
The output format you actually want
The edge cases someone discovered the hard way

When you lose it, you don't just lose text. You lose the twenty minutes of iteration that produced it — multiplied by every teammate who later reinvents the same thing.

A lightweight system that works

You don't need anything fancy to start. Here's the minimum viable prompt library:

One shared location. Not five. One.
A title that describes the job, not the tool. "Turn meeting notes into action items" beats "ChatGPT prompt 3".
A category. Marketing, support, code, hiring — whatever maps to your team.
The prompt itself, copy-paste ready. No screenshots.
One line on when to use it.

That's it. The discipline is harder than the structure: every time a prompt earns a reaction in chat, it goes in the library before the thread scrolls away.

Where this breaks down

A shared doc or sheet gets you surprisingly far. It breaks when:

People stop copying prompts out because it's two clicks too many
There's no sense of which prompts are actually being used
Prompts drift, and nobody knows which version is current

At that point the bottleneck isn't structure — it's friction and visibility.

How PromptShip fits in

This is the itch we built PromptShip to scratch. It's a shared prompt library for teams: one-click copy straight into ChatGPT, Claude, or Gemini, categories, version history, and usage analytics so you can see which prompts your team actually relies on. Think Notion-for-prompts rather than a developer toolkit — it's aimed at marketing, sales, support, and HR folks as much as engineers. There's a free tier if you want to try the idea without committing.

But the tool is secondary. The mindset is the point: treat your best prompts as shared infrastructure, not disposable chat messages.

Key takeaways

The prompts that work are operational knowledge — capture them deliberately
Start with one shared location, job-based titles, and copy-paste-ready text
Watch for friction and version drift as your library grows
Whatever tool you use, the win is making good prompts findable and reusable

What's your team's system for this right now? Curious whether anyone's actually cracked it with plain docs.

Sunday Notes: Find Where Before You Theorize Why

speed engineer — Sun, 14 Jun 2026 03:47:56 +0000

This week I wrote about two things that have nothing to do with each other.

On Saturday it was a debugging story: a service was losing 30% of its UDP packets, I spent the better part of a day convinced a switch was dying, and the network turned out to be completely innocent — my own host was accepting the datagrams and then dropping them because a socket buffer kept filling during bursts. One kernel counter (netstat -su, "receive buffer errors") would have told me that in about ten seconds.

Earlier in the week it was product stuff: why teams keep losing their best work, why a freelancer's Friday disappears into "what was I even doing on Tuesday," why a marketing team rewrites the same prompt every quarter.

It wasn't until I sat down to write this recap that I noticed they're the same lesson.

The shared mistake

In the packet story, the expensive move was theorizing about why before establishing where. "The network is dropping packets" is a theory about cause. It sent me to the wrong team, the wrong dashboards, the wrong week. The cheap move — the one I skipped — was localizing first: are the packets dying on the wire, or after they reach my box? Two completely different buildings, identical symptom.

Product decisions have the exact same failure mode. "Users churn because we're missing feature X" is a theory about why. It's also, usually, the most expensive possible thing to act on first, because building X takes a quarter and the theory might be wrong. The cheap move is to localize: where in the week, the workflow, or the funnel does the value actually leak out?

When you force yourself to find where first, the answer is often boring and small — and boring and small is good news, because boring and small is cheap to fix.

Why I keep coming back to this

Both of the things I work on are, underneath the marketing, just instruments for making "where" visible.

FillTheTimesheet exists because "I undercharged this month" is a why-theory; the useful version is where the billable hours actually went, captured while they happened instead of reconstructed on Friday.

PromptShip is the same shape pointed at a different problem: "our team is bad at AI" is a why-theory; "the prompt that worked is sitting in one person's chat history and nobody else can find it" is a where. One is an identity crisis, the other is a Tuesday-afternoon fix.

Neither is glamorous. Both are counters you check before you theorize.

The Sunday takeaway

Find where before you theorize why. It works on packets, it works on churn, and it works on most arguments that have gone in circles for more than ten minutes. Locate the problem in space before you start explaining it — the explanation is usually cheaper, and more often correct, once you know where you're standing.

The full UDP debugging story, counters and buffer math included, is on Medium if that's your kind of weekend reading: I Lost 30% of My UDP Packets — The Debugging Story.

What's the last thing you assumed the cause of — before you actually measured where it was happening?

I Lost 30% of My UDP Packets — and the Network Was Innocent

speed engineer — Sat, 13 Jun 2026 03:54:53 +0000

A receiver pulling a UDP feed was missing roughly 30% of its messages. No errors, no exceptions, no stack traces — just gaps in the sequence numbers. The first suspect is always the network: a flaky switch, a saturated link, a tired NIC.

The network was innocent. The packets were being dropped on the receiving host, after they'd already arrived. Here's how to tell the difference, and why it matters.

Why UDP makes this sneaky

UDP has no retransmission and no backpressure. When a datagram is lost, nobody is notified — not the sender, not the receiver. The packet simply isn't there.

That means two completely different failures look identical from the application's point of view:

The network dropped the packet before it reached your machine.
Your own host accepted the packet and then threw it away after it arrived.

The application sees the same thing in both cases: a missing sequence number. But the fix is in a different building depending on which one it is.

Where the packets actually go

The receive path is: NIC → kernel socket receive buffer → your recv() call. The kernel parks incoming datagrams in a per-socket buffer until your code reads them. If your code doesn't drain that buffer fast enough, it fills, and the kernel drops the overflow. Crucially, the kernel counts those drops.

On Linux:

# Per-protocol summary — look for "receive buffer errors"
netstat -su

# Or straight from the kernel counters
cat /proc/net/snmp | grep -A1 Udp
#   InDatagrams  ... InErrors  RcvbufErrors ...

If RcvbufErrors is climbing, the network did its job and your host discarded the datagrams. That single counter collapses a week of "is it the switch?" into about ten seconds of certainty.

The actual cause

In this case the socket receive buffer was sitting at the default (~208 KB). The sender burst faster than a single receive thread could call recv(). Average throughput looked fine on every dashboard — but the bursts filled the buffer in milliseconds, and everything past the brim was dropped. The metric that mattered wasn't mean throughput; it was peak burst versus drain rate.

The fix, in order of leverage

Drain faster. The receive loop was parsing and doing a database write inline. Anything that isn't "copy bytes out of the socket" belongs off the hot path: recv() → hand the buffer to a queue → immediately loop back to recv().
Raise the buffer. Bump SO_RCVBUF, and raise net.core.rmem_max so the kernel actually honors the request. A bigger buffer doesn't fix a slow consumer — it absorbs bursts so a fast-enough consumer never falls behind. You usually need both this and #1.
Batch your syscalls. recvmmsg() pulls many datagrams per system call, which cuts per-packet overhead when volume is high.
Spread the load. If one core genuinely can't keep up, SO_REUSEPORT lets multiple threads share the same port with separate buffers.

Key takeaways

"Packet loss" is a location, not a cause. Find out where before you theorize about why.
With UDP, silent drops are the default — the protocol won't tell you, so the kernel counters have to.
RcvbufErrors is the first thing to check. It almost always points at a receive buffer that's too small or a consumer that's too slow.
A bigger buffer absorbs bursts; a faster drain prevents them. You usually want both.

The full debugging story — the live-feed before/after, the buffer math, and the exact counters I watched while tuning it — is on Medium:

Networking for Developers: I Lost 30% of UDP Packets — The Debugging Story

I write more like this on Medium as **The Speed Engineer* — performance engineering, debugging stories, and the lower-level systems work that doesn't fit in a tweet.*

We Built Two Products Around the Same Boring Insight: Valuable Work Leaks

speed engineer — Fri, 12 Jun 2026 03:53:43 +0000

The pattern we kept ignoring

When you build software for a living, you start noticing the same problem wearing different costumes. The costume changes; the body underneath is always the same: valuable work gets created, and then quietly leaks away before anyone captures it.

We have now built two products around that one observation. Here is the story — and the lesson I wish we had internalized three years earlier.

Leak #1: billable time

Our first product started as an internal tool. We were a small consulting shop, and at the end of every month we reconstructed our hours from memory, calendar invites, and guilt.

The math was brutal. If each person under-reports just 20 minutes a day — a quick call here, a "real fast" review there — that is roughly 7 hours a month per person walking out the door unbilled. For a five-person team billing $100/hr, you are lighting about $3,500 on fire. Every month.

The fix was not a better spreadsheet. It was removing the memory step entirely: tracking time as it happens instead of reconstructing it later. That tool became FillTheTimesheet.

Leak #2: your best prompts

Fast-forward to the AI era. We watched our own team — marketing, sales, support, not engineers — get genuinely good at writing prompts for ChatGPT and Claude. Someone would craft a prompt that turned a 40-minute task into a 4-minute one...

...and then paste it into a Slack thread, where it died.

Two weeks later, three other people would reinvent a worse version of the same prompt. The institutional knowledge existed for exactly one session, then leaked away.

Same body, new costume. So we built PromptShip — a shared prompt library so a team's best prompts get captured once and reused by everyone, instead of getting lost in chat history.

The lesson

Here is the part I would tattoo on past-me: the most valuable problems are the ones so mundane that everyone has quietly accepted them.

Nobody files a feature request for "I keep forgetting my hours" or "our good prompts disappear." They absorb the loss as a cost of doing business. That acceptance is exactly where the opportunity hides.

If you are looking for something to build — or just something to fix in your own workflow — do not hunt for exotic problems. Look for the leaks everyone has stopped noticing: work that gets created but never captured, knowledge that lives in one person's head, value that is generated and then immediately discarded.

Plug one of those and you will rarely have to explain why it matters.

Key takeaways

Valuable work leaks constantly: unbilled time, lost knowledge, discarded outputs.
The best problems are mundane enough that people have stopped complaining about them.
Capture-at-the-moment beats reconstruct-it-later, every time.
You do not need a novel problem. You need an unaddressed one.

Networking for Developers: TCP vs UDP (When Each Protocol Kills Your App)

speed engineer — Thu, 11 Jun 2026 13:26:43 +0000

Your video call stutters. Your game lags. You picked the wrong protocol — and now you’re debugging packets at mid night.

Networking for Developers: TCP vs UDP (When Each Protocol Kills Your App)

Your video call stutters. Your game lags. You picked the wrong protocol — and now you’re debugging packets at mid night.

Choosing between TCP and UDP isn’t academic — it’s the difference between your app working and your users complaining. Pick wrong and you’ll trace symptoms for days before finding the real cause.

Our P99 latency hit 5 seconds randomly. Three days of packet tracing led me to a single dropped packet.

Not corrupted. Not delayed. Just gone.

I’d been running a real-time sensor network. Temperature readings every 100ms. Life-or-death? No. But the client paid for sub-second response times. We were violating SLA hourly.

The system used TCP. Reliable delivery, ordered packets — textbook choice for anything important, right?

Wrong.

The Debugging Session That Changed Everything

So I checked the obvious first. Network saturation? No. Server CPU? Fine. Memory leaks? Clean. I spent two days looking at application code before someone suggested I actually look at the network.

I fired up tcpdump on the sensor gateway:

tcpdump -i eth0 -n 'tcp port 8080' -w capture.pcap

Watched it for an hour. Then opened Wireshark.

That’s when I saw it. One packet dropped. TCP’s retransmission timer kicked in. 200ms wait. Retry. Another drop. Exponential backoff. 400ms. 800ms. 1600ms. By the time the packet finally made it through, we’d blown past 5 seconds.

Five seconds of latency because of one 512-byte packet.

I Assumed TCP Was Reliable — Then Packet Loss Taught Me Different

TCP is reliable in that it eventually delivers your data. But reliable doesn’t mean fast. It doesn’t even mean predictable.

Networks fail.

When a packet drops on TCP, the entire connection stalls. TCP guarantees ordering. So if packet #47 disappears, packet #48 through #500 just… wait. They’re already at the receiver. Sitting in a buffer. Unusable. This is head-of-line blocking.

UDP doesn’t care. Packet #47 vanishes? Packet #48 gets delivered anyway. No waiting. No retries. No guarantees.

I had temperature sensors. If reading #47 was lost, reading #48 was still useful. More useful than waiting 5 seconds for stale data.

This matters for revenue. Our client was aggregating sensor data for HVAC optimization. Five-second-old temperature readings meant HVAC systems reacting to old conditions. Wasted energy. Real dollars.

The Foundation: What These Protocols Actually Do

TCP establishes connections. Three-way handshake: SYN, SYN-ACK, ACK. Overhead before you send a single byte of application data. Every packet gets acknowledged. Missing ACK? Retransmit. Receiver buffers out-of-order packets and delivers them in sequence to your application.

Flow control prevents fast senders from overwhelming slow receivers. Congestion control backs off when the network is saturated. TCP is a state machine with 11 different states. It’s complex because it’s trying to make an unreliable network look reliable.

UDP is simpler. You call sendto(). Packet goes on the wire. That's it. No connection. No state. No acknowledgments. No retries. No guarantees about ordering or delivery.

Here’s what UDP looks like:

import socket  

sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)  
sock.sendto(b"temperature:23.5", ("10.0.1.50", 8080))

Four lines. Fire and forget. If the network drops it, you’ll never know.

TCP needs more ceremony:

import socket  

sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)  
sock.connect(("10.0.1.50", 8080))  
sock.sendall(b"temperature:23.5")  
sock.close()

Connection setup and teardown. More syscalls. More network round-trips. Higher latency even when nothing goes wrong.

TCP’s handshake and acknowledgment overhead seems reasonable — until you’re sending hundreds of small messages per second and every round-trip adds milliseconds. UDP skips all of it.

When Different Protocols Make Sense

I rebuilt the sensor system with UDP. Latency dropped to 50ms P99. Problem solved.

Except not every problem wants UDP.

Use TCP when : data loss is unacceptable. HTTP requests — you can’t skip part of an HTML page. File transfers — corrupted files are useless. Database queries — missing rows break application logic. Email delivery — partial messages don’t work. API calls where you need confirmation.

Use UDP when : timeliness beats completeness. Video streaming — one dropped frame is invisible, buffering to retry is noticeable. Gaming — 200ms-old player positions are worthless, waiting for retransmits is worse. DNS queries — if the response doesn’t arrive, just ask again. Metrics collection — one missing data point doesn’t invalidate the trend. VoIP — humans tolerate brief audio dropouts better than lag.

Speaking of DNS, that’s often your first failure point. DNS uses UDP for speed. Queries timeout after 2 seconds and retry. If DNS is slow, everything feels slow — even TCP connections stall at hostname resolution.

Actually, most people don’t realize DNS is UDP by default. It falls back to TCP only for responses over 512 bytes (before EDNS). This design choice prioritizes speed for the 99% case.

The Moment I Actually Saw Network Behavior

Back to my debugging session.

I set up continuous packet capture. Left it running overnight. Next morning I filtered for retransmissions:

tcp.analysis.retransmission

Thousands of entries. Not random though. They clustered around 2 AM. Same time every night.

So I checked the infrastructure logs. Backup jobs. Every night at 2 AM, backup traffic saturated the 1Gbps link. TCP saw congestion, slowed down, retransmitted dropped packets. My sensor traffic got caught in it.

With UDP, the backup traffic still saturated the link. Some sensor packets still dropped. But the surviving packets arrived immediately. No cascading retransmission delays.

I’m still not sure why the network team scheduled backups during production hours. Politics, probably.

The Gotcha: Timeouts and Buffer Sizes

Here’s my real mistake: I set my read timeout too short. Wasted a week tuning it.

sock.settimeout(0.5)  # Don't do this blindly

Half a second seemed reasonable. But under congestion, legitimate packets took 800ms to arrive. My timeout fired. Connection closed. Data lost.

I bumped it to 2 seconds. Helped sometimes. Made the head-of-line blocking worse other times. Longer timeouts meant the application waited longer when packets actually were lost.

With UDP, timeouts work differently. You’re not waiting for the protocol to retry. You’re just waiting for data. If nothing arrives, your application decides what to do. Send a new request? Use stale data? Your choice.

Buffer sizes matter too. TCP receive buffers hide latency problems until they overflow. Then you get tail latency spikes. UDP has smaller buffers because there’s no reordering queue. Lost packets don’t consume buffer space.

Diagnosis Cascades Through Layers

Here’s how the debugging actually went. Timeouts are symptoms. I saw timeouts in application logs. Traced them to slow responses. Speaking of symptoms, high CPU often masks network issues — if your app is busy retrying, CPU looks busy, but you’re not doing useful work.

Slow responses came from retransmissions. Retransmissions came from packet loss. Packet loss came from link saturation. Link saturation came from backup jobs.

Each layer revealed the next. This is how network debugging works. You start at the application layer (HTTP 500s, timeouts) and work down through TCP (retransmissions, connection resets) to IP (routing, fragmentation) to the physical layer (link saturation, bit errors).

Why connection pooling matters: every TCP connection has setup cost. If you’re making hundreds of requests per second, connection setup becomes the bottleneck. Pools amortize that cost. But they also hide problems — a bad connection stays in the pool, serving corrupt data until health checks remove it.

UDP doesn’t have connection pools. No connections to pool. Each packet is independent. Lower complexity, but you lose connection-level metrics and circuit breaking.

The Middle Ground: QUIC

I mentioned the sensor network earlier. We eventually migrated to QUIC.

QUIC runs on UDP but adds reliability features. Selective acknowledgments — only retransmit lost packets, not everything after them. Connection migration — your phone switches from WiFi to cellular, connection survives. Reduced handshake latency — combines TCP’s three-way handshake and TLS setup into one round-trip.

HTTP/3 uses QUIC. Google built it to fix TCP’s head-of-line blocking for web traffic. A slow-loading image doesn’t block JavaScript anymore.

QUIC isn’t perfect. It’s complex. Debugging is harder — encrypted from the start, so tcpdump shows less. NAT traversal can be tricky. CPU overhead is higher than plain UDP.

But for applications that need reliability and low latency, it’s worth considering.

[Image Prompt: A three-tier timeline showing network evolution: TCP (1981), UDP (1980), and QUIC (2012) with arrows indicating “reliability” vs “speed” tradeoffs. Show how QUIC attempts to combine both.]

Caption: QUIC isn’t replacing TCP everywhere, but it’s solving real problems for specific use cases. When you need both reliability and speed, the protocol layer matters more than you think.

What This Means For Your Next Project

Don’t cargo-cult protocol choices. “Everyone uses TCP” isn’t engineering reasoning.

Ask: what happens when packets drop? If you need every byte in order, TCP is right. If recent data beats complete data, consider UDP.

Test under realistic conditions. Packet loss isn’t theoretical. Saturate your network in staging. Drop packets with tc on Linux:

tc qdisc add dev eth0 root netem loss 1%

One percent packet loss. Watch how your application behaves. TCP might be fine. Or you might see latency spike to seconds.

Measure what matters. Throughput? Latency? P99? P999? Different protocols optimize for different metrics. UDP gives better P99 latency. TCP gives better worst-case reliability.

My sensor network runs on QUIC now. P99 latency is 45ms. Packet loss doesn’t cascade anymore. We still lose packets — that’s networks — but the system degrades gracefully.

Tomorrow: I debugged a UDP packet loss nightmare. Turns out application-level acknowledgments are harder than they look. More on that soon.

Sources consulted : RFC 793 (TCP), RFC 768 (UDP), RFC 9000 (QUIC), Linux kernel TCP implementation docs, Cloudflare’s blog on QUIC deployment

Enjoyed the read? Let’s stay connected!

🚀 Follow The Speed Engineer for more Rust, Go and high-performance engineering stories.
💡 Like this article? Follow for daily speed-engineering benchmarks and tactics.
⚡ Stay ahead in Rust and Go — follow for a fresh article every morning & night.

Your support means the world and helps me create more content you’ll love. ❤️

Your team doesn't have a prompt problem. It has a blank-box problem.

speed engineer — Thu, 11 Jun 2026 03:45:56 +0000

Most advice about getting more out of AI is aimed at one person getting better at writing prompts. Take a course, learn the "formula," practice every day. That's fine if you're an enthusiast. It's a bad deal for a team of busy people who just want to finish a task.

Here's what actually happens on most teams.

The blank box

Someone opens ChatGPT to do a real task — turn a long customer email into three clear bullet points, draft a job posting, rewrite a paragraph for a newsletter. They get an empty box and a blinking cursor. They don't know what "good" looks like for this task, so they either type something vague ("make this better"), get something vague back, or they freeze and close the tab.

The conclusion they walk away with is "AI isn't that useful for my work." The real problem is that they started from zero, with no idea what a strong prompt for that task even looks like.

Meanwhile, someone three desks over wrote a great prompt for that exact task last month. It works every time. It just lives in their personal chat history, where no one else will ever see it.

For a team, prompting is a distribution problem

For one person, writing better prompts is a skill you build over time. For a team, the thing holding you back usually isn't skill — it's distribution. The good prompts already exist. They're trapped in one person's account.

So the goal isn't to turn everyone into a "prompt engineer." It's to make sure the person facing the blank box right now can start from a proven prompt instead of from nothing.

The habit that fixes it

You don't need a tool to start. You need four small habits:

Notice when a prompt works. The moment you get output you'd actually use, that prompt just became worth keeping. Most people throw it away by closing the tab.
Label it by the task, not the topic. "Summarize a support thread into themes" beats "AI stuff." People look for prompts by the job they're trying to do, so name it that way.
Start from the closest one and adapt 20%. The skill that actually scales on a team isn't writing prompts from scratch — it's recognizing "this is 80% of what I need" and changing the rest. Starting from a known-good prompt also teaches people what good looks like faster than any course.
Keep them where people already are. A prompt nobody can find is the same as no prompt. A pinned doc or a shared channel beats ten private chat histories.

Where the doc breaks down

For a week, a shared doc or a Slack canvas is enough. Then it scatters. Prompts get pasted into DMs, the doc goes stale, nobody updates it, and everyone quietly drifts back to the blank box.

That's the gap we built PromptShip to fill — a shared prompt library for non-technical teams (it works with ChatGPT, Claude, and Gemini) so the person facing the blank box starts from a teammate's proven prompt in a couple of clicks instead of from scratch. But the habit matters more than the tool: even a pinned list of your team's ten best prompts beats starting from zero every time.

Takeaways

For most non-technical teams, the bottleneck isn't prompt-writing skill — it's the blank box, and starting every task from nothing.
The good prompts already exist; they're stuck in one person's chat history.
Notice what works, label it by the task, start from the closest one and adapt, and keep them somewhere everyone can reach.
Past a handful of prompts and people, a shared library does this automatically — but even a pinned doc beats starting from scratch.

What's the one prompt on your team that everyone should have, but only one person actually does?

Your Oldest Client Is Probably Your Lowest-Paying One

speed engineer — Wed, 10 Jun 2026 03:52:02 +0000

Most freelancers can name their highest-paying client instantly. Ask them which client pays the least per hour and they go quiet — because the answer is usually the one they'd never suspect: the loyal, long-running retainer that's felt "safe" for years.

The client you never re-priced

Here's how it happens. You sign a retainer at $2,000/month for what you both estimate is about 20 hours of work. That's $100/hour. Good deal, everyone's happy.

Then time passes. The relationship gets comfortable. And comfort is exactly where scope creep lives:

"Can you also just look at this real quick?"
"While you're in there, could you update the other page too?"
"We added a new tool — can you handle that going forward?"

Each request is small. Each one is easy to say yes to, because you like them and the relationship is good. None of them comes with a fee change.

Eighteen months later, that 20-hour retainer is quietly a 38-hour retainer. Same $2,000. Your effective rate just fell from $100/hour to about $53 — and nobody decided that on purpose. You gave your biggest client an enormous raise, and you never saw the memo, because you sent it to yourself.

Why a flat fee hides the leak

Hourly work has a built-in alarm: more hours, bigger invoice, and you notice. A flat fee removes that alarm completely. The invoice is identical whether the month took 20 hours or 40. The number on the invoice stops telling you anything about the number that actually matters — your real hourly rate.

So the erosion is invisible by design. There's no line item for "scope that crept in since last year." The work feels normal because it grew one reasonable favor at a time. And the client isn't being shady — they genuinely don't know how long things take you. Only you can know that, and only if you're measuring.

The fix: track hours against the flat fee

You don't need to switch the client to hourly. You just need to know your effective rate, which means logging time even on work you're not billing hourly.

The drill:

For 30 days, track every hour you spend on each retainer, tagged to that client. Don't change anything else.
At month's end, divide the flat fee by the hours. That's your real rate for that client.
Compare it to your target rate. If the retainer has drifted 30–40% below your other work, that's not a loyal client — that's a subsidy.
Re-scope or re-price. "Here's what the retainer originally covered, here's what it covers now — let's right-size it" is a normal, professional conversation. The data makes it un-awkward.

Most people who run this are stunned by which client comes out lowest. It's almost never the demanding one. It's the easy one you've had forever.

How FillTheTimesheet fits in

I built FillTheTimesheet partly for this: it lets you log time against a client even when the engagement is flat-fee, then shows effective hourly rate per client so the erosion can't hide. But you can do the whole audit with a spreadsheet and a timer. The tool just keeps the number in front of you every month instead of once, by accident, when you finally wonder why the "safe" client never feels worth it.

Key takeaways

A flat fee removes the feedback loop that hourly billing gives you — scope can grow without the invoice ever changing.
Your oldest, friendliest retainer is the most likely to have eroded, because comfort is where scope creep lives.
Track hours against flat-fee work for 30 days and compute effective rate per client.
If a retainer has drifted well below your target rate, re-scope or re-price — the data turns an awkward ask into an obvious one.

Run it for one month. The client who comes out at the bottom of that list is the conversation you've been avoiding.

Prompt drift: why the AI prompt that worked last month quietly stopped working

speed engineer — Tue, 09 Jun 2026 10:24:36 +0000

If you share AI prompts with your team, you've probably hit this without having a name for it:

A prompt that produced great output a month ago now produces mediocre output. Same prompt, same model, worse results. Nobody changed the model. So what happened?

Usually, the prompt changed — a little at a time, by people trying to help.

Prompt drift

Here's the typical sequence. Someone writes a genuinely good prompt — say, one that turns messy meeting notes into a clean summary. It works. They paste it into a shared doc so the team can reuse it.

Then it starts drifting:

A teammate adds "keep it under 100 words" because their summaries ran long.
Someone else adds "use bullet points" for their own use case.
A third person rewords a line to fix a one-off problem.

Each edit made sense for the person making it. But they all edited the same shared copy, in place, with no record of what changed or why. Three weeks later the prompt is a patchwork of everyone's special cases, and the original — the one that actually worked — is gone. You can't roll back to it, because nobody saved it.

That's prompt drift: the slow degradation of a shared prompt that no single person broke.

Why a shared doc makes it worse

A Google Doc or Notion page feels like the obvious home for team prompts. It's better than nothing, but it has the exact property that causes drift: one editable copy, no versions, no rollback. The moment two people have different needs for the same prompt, one overwrites the other, and there's no known-good version to return to.

The fix: treat prompts like recipes, not messages

A chat message is disposable. A recipe is something you keep, refine, and can always cook again. Treat your reusable prompts as recipes:

Name it. "Meeting-notes → summary" beats "that summary prompt Dana shared." A shared name means everyone is talking about the same thing.
Version it. Every time you change a prompt, save it as a new version instead of overwriting. Keep a one-line note — what changed and why ("Jun 9 — added word limit for newsletter use").
Save the output that worked. Store one example of the result the prompt produced when everyone agreed it was good. That's your reference point: when quality drops, you compare against it instead of arguing from memory.
Fork instead of overwrite. When your use case differs, copy the prompt to a new version — don't edit the shared one. The newsletter team and the support team can each keep a variant without stepping on each other.
Roll back fearlessly. When a prompt gets worse, don't debug it — restore the last version that worked, then re-apply changes one at a time until you find the one that hurt.

None of this requires special tooling. You can do it in a doc with manual version headers, and for a handful of prompts that's fine.

Where it breaks down

The doc approach falls apart somewhere around 15–20 prompts and three or more people. Manual version headers get skipped, nobody saves the "good" output, and you're back to drift. That's the point where a purpose-built shared prompt library earns its keep: it keeps every prompt named, versioned, and roll-back-able by default, so the discipline happens automatically instead of relying on everyone remembering.

That's the gap we built PromptShip to fill — a shared prompt library for teams (works with ChatGPT, Claude, and Gemini) where every prompt has version history, so you can always get back to the version that worked. But the habit matters more than the tool: even if you never adopt a library, versioning your prompts will save you the next time one mysteriously stops working.

Takeaways

Shared prompts degrade over time through well-meaning in-place edits — prompt drift.
A single editable copy (the typical shared doc) is what enables it.
Name your prompts, version them, save the output that worked, fork instead of overwrite, and keep the ability to roll back.
Past ~15 prompts and a few people, a versioned prompt library does this automatically.

How does your team keep track of the prompts that actually work?

The 6-Minute Tasks That Quietly Cost Freelancers a Full Day of Pay Each Week

speed engineer — Mon, 08 Jun 2026 03:45:20 +0000

Ask a freelancer how many hours they billed last week and you'll get a confident number. Ask how many hours they actually worked and the number gets fuzzy. The gap between those two numbers is almost always made of small tasks — and it's bigger than you think.

The tasks you never log

You log the two-hour design session. You log the afternoon you spent on the client's API integration. What you don't log:

the 6-minute reply to "quick question about the invoice"
the 4-minute Slack clarification before you could start real work
the 9 minutes hunting for the brand assets they swore they'd sent
the 12-minute call that "wasn't really billable"

Each one feels too small to bother with. Individually, it is. The problem is that there are fifteen of them a day.

The one-week audit

Here's the experiment. For one week, log everything — including every task under ten minutes. Don't judge it, don't decide whether it's billable, just capture it. Put a tally next to anything that feels "too small to bill."

At the end of the week, total that column.

Most people I've talked into doing this land somewhere between four and six hours. That's a half to a full working day, every week, of real client work that left no trace on an invoice. Over a year it's the single biggest leak in a freelance P&L, and almost nobody measures it.

Why it leaks

It isn't laziness. It's three structural things:

The unit feels wrong. A six-minute task measured against a one-hour mental "minimum billable unit" feels un-loggable, so it vanishes.
The switching cost hides it. The task interrupted something else, so you bucket it under that something else — or under nothing.
You feel awkward billing for it. "I'm not going to charge them for a four-minute email" is a sentence that, repeated 300 times a year, is a meaningful pay cut.

What to do with the number

Once you can see the leak, you have options. Batch micro-tasks into a single daily "client comms" line and bill it honestly. Set a real minimum billable unit (most agencies use 15 minutes) and apply it consistently instead of silently eating anything smaller. Or build the small stuff into your rate from the start, so you're not deciding case-by-case at 5pm whether a task "counts."

The point isn't to nickel-and-dime your clients. It's to stop nickel-and-diming yourself by accident.

How FillTheTimesheet fits in

I built FillTheTimesheet partly because of this exact leak — it lets you drop in micro-entries as fast as you can type them, then totals the "too small to bill" pile for you at the end of the week, so the number is staring you in the face instead of hiding. But the audit works with a notebook. The tool just makes the weekly total automatic.

Key takeaways

The gap between hours worked and hours billed is mostly sub-10-minute tasks.
Run a one-week audit: log everything, tally the "too small to bill" items, total it.
Expect 4–6 hours. That's up to a full day of unbilled work a week.
Fix it structurally: batch, set a minimum unit, or price it in.

Run the audit for one week. The number alone will change how you bill.

Why Your AI Prompts Work for You But Fail for Your Teammates (And the 4-Block Format That Fixes It)

speed engineer — Tue, 02 Jun 2026 04:30:42 +0000

Why Your AI Prompts Work for You But Fail for Your Teammates (And the 4-Block Format That Fixes It)

You write a prompt for ChatGPT that drafts a perfect outreach email. You paste it into Slack. A teammate tries it. The output is mediocre.

You're not crazy. The prompt didn't get worse. It was never portable to begin with.

This is one of the most under-discussed reasons team AI workflows fail: prompts that depend on context only the author has in their head. Once you see the pattern, you can fix any prompt in about 10 minutes.

Why "Just Copy My Prompt" Doesn't Work

A working prompt is actually three things:

The text you typed
The context in the chat history above it
The mental model you have of what "good" looks like

When you share the prompt, only piece #1 travels. Pieces #2 and #3 stay with you. Your teammate gets the recipe without the pantry — and the output reflects that.

The fix isn't a better model. It's a prompt format that forces all three pieces into the prompt itself.

The 4-Block Format

Every reusable team prompt should have four labeled blocks:

[ROLE]
You are a B2B SaaS sales rep writing to a CFO at a mid-market company.

[CONTEXT]
- Our product: PromptShip — a shared prompt library for non-technical teams
- Their pain: AI prompts scattered across Slack, no team standard
- Their company: 50-500 employees, recently funded
- Tone: direct, no jargon, short paragraphs

[TASK]
Write a 4-sentence cold email that opens with a specific observation about their team, references a real pain, and asks for a 15-minute call.

[OUTPUT FORMAT]
- Subject line (max 8 words)
- Body (max 4 sentences)
- One follow-up subject line variant

Four blocks. Always in this order. No exceptions.

The magic isn't the structure itself — it's that each block forces you to write down something you'd normally leave implicit.

Refactoring a Real Prompt

Here's a prompt one of our users shared. It worked great for her, but every teammate who tried it got bland output:

Before:

"Write a follow-up email for the demo I had yesterday with the marketing director."

What's wrong: "the demo I had" assumes the AI knows what the demo covered. "Marketing director" without industry context is too generic. No tone guidance, no length constraint.

After (in 4-Block format):

[ROLE]
You are a customer success rep at a B2B SaaS company.

[CONTEXT]
- Yesterday's demo: 30 min walkthrough of a prompt library tool
- Prospect: Marketing Director at a 200-person fintech company
- Their stated pain: team uses AI but no shared prompts, work gets duplicated
- They asked for: a way to share their copywriting prompts with the social team
- Our pricing: $15/mo for 10 seats

[TASK]
Write a follow-up email that thanks them, reinforces the one specific thing they said they cared about, and proposes a 15-min call with their social team lead.

[OUTPUT FORMAT]
- Subject line (max 8 words, no "Following up")
- Body (3 short paragraphs)
- Clear next-step CTA

Same task. The "after" version works for anyone on the team because all the implicit context is now explicit.

Why This Compounds Across a Team

Once your team writes prompts in this format, three things happen:

Onboarding gets faster. A new hire can run a senior teammate's prompt and get a senior-quality output, because the prompt carries its own context.

Prompts become editable assets. When the model changes or your product evolves, anyone can update the [CONTEXT] block without reverse-engineering the original author's intent.

You can actually share prompts. Not just paste them and hope. Share them — with the confidence that they'll work for the next person.

How We Use This at PromptShip

PromptShip is a shared prompt library for teams using ChatGPT, Claude, or Gemini. The 4-Block format is baked into how we recommend customers structure prompts — every saved prompt has fields for role, context, task, and output, so teammates can copy a prompt with one click and trust that it'll work.

The free plan (200 prompts, 1 user) is enough to test this format across a few of your team's top prompts before rolling it out.

Key Takeaways

Prompts that work for you but fail teammates aren't bad prompts — they're prompts missing implicit context
The 4-Block format (Role / Context / Task / Output Format) makes prompts portable
Refactoring takes ~10 min per prompt and pays back in every future use
Shared prompt libraries only work if the prompts inside them are written to be shared

What's the most-copied prompt on your team right now? Try refactoring it into 4 blocks and see if the output improves. I'd love to hear what changed.