Week one of the pilot, our voice agent booked appointments for a dental group. Forty operatories, three locations, one phone line per office that never stopped ringing. The agent took the call, checked the calendar, wrote the slot, read it back. Clean.
The 9pm page came on a Thursday. Front desk had found four double-booked slots from that afternoon. Same callers, same times, two rows each in the scheduling table. The agent swore (in the transcript) it booked once. The database swore it booked twice. Both were telling the truth.
Here is what actually happened. Our booking call went out to the practice management API. That API was slow that day, p95 around 4200ms, sometimes worse. We had a 3000ms timeout on the HTTP client. So the request would land, the booking would commit on their side, and our client would give up waiting before the 201 came back. The agent saw a timeout, treated it as a failure, and said the line every voice agent says when something goes wrong: "sorry, let me try that again." Then it fired the same booking a second time. The second one was fast enough to return. Two rows. One confused Mrs. Alvarez.
The retry was the bug. Not the slowness. Slowness is normal. The sin was retrying a write that had no idempotency key, so the downstream system had no way to know the second request was the same intent as the first.
The fix was small and boring, which is the best kind. Generate a stable key per booking intent (not per HTTP attempt) and pass it through. If the agent decides to book the 2pm slot for this caller, that decision gets one key, and every retry of that decision carries it.
import hashlib
def booking_key(call_id: str, slot_iso: str, provider_id: str) -> str:
# one key per intent. survives retries, timeouts, agent re-prompts.
raw = f"{call_id}:{slot_iso}:{provider_id}"
return hashlib.sha256(raw.encode()).hexdigest()[:32]
resp = client.post(
"/appointments",
json=payload,
headers={"Idempotency-Key": booking_key(call_id, slot_iso, provider_id)},
timeout=8.0, # also: stop timing out under their real p95
)
Two things mattered together. The key made the duplicate write a no-op on the server (their API honored Idempotency-Key, most modern ones do, and if yours does not, you build the dedup yourself with a unique constraint on those three fields). And the timeout went from 3000ms to 8000ms, because a 3000ms ceiling on a 4200ms p95 is not a timeout, it is a duplicate-booking generator with extra steps.
We shipped the key first, that same night. Double-bookings went to zero across the next 1,800 calls. The timeout bump went out the next morning after I pulled a week of latency histograms and saw the real tail.
What I would tell week-one me: a voice agent retrying a write is not retrying a question. When the agent says "let me try that again," something on the other end may already be true. Decide what one action means before you let the agent do it twice. Put the key on the intent, not the attempt, and timeouts on the real numbers, not the round ones.
Top comments (0)