Every Go service that does I/O — database calls, HTTP requests, queue polling, file reads — should be passing a context.Context through every layer. In practice, a large number of codebases treat context.Context as optional plumbing that gets added later, or as something only the HTTP handler layer needs to worry about.
That’s the mistake. By the time you realize you need cancellation deep in a worker pool, retrofitting it is painful. This post shows what context propagation looks like in a real worker pool, what the failure modes are when you skip it, and how to do it correctly from the start.
What context.Context actually does
A context.Context carries three things:
- A cancellation signal — a channel that’s closed when the work should stop
- A deadline or timeout — an absolute time after which the context is automatically cancelled
- Key-value pairs — request-scoped values like trace IDs (use sparingly)
When a context is cancelled, any blocking operation that’s watching that context should stop and return an error. The context propagates down the call stack — a parent context cancelled cancels all children. It never propagates up.
The critical point: the context has to be passed all the way down to the code that actually blocks. Holding it at the HTTP handler layer and never passing it to your database call means cancellation never reaches the work.
What happens without context
Here is a worker pool that ignores context:
func runWorkers(jobs <-chan Job) {
var wg sync.WaitGroup
for i := 0; i < 10; i++ {
wg.Add(1)
go func() {
defer wg.Done()
for job := range jobs {
processJob(job) // no context
}
}()
}
wg.Wait()
}
func processJob(job Job) error {
result, err := fetchFromDatabase(job.ID) // no context
if err != nil {
return err
}
return sendToDownstream(result) // no context
}
Now consider what happens when:
- The calling service shuts down
- A client disconnects
- A deadline passes
- An operator sends SIGTERM
The workers keep running. fetchFromDatabase keeps waiting. sendToDownstream keeps trying. The process can’t exit cleanly because goroutines are blocked on operations that have no way of knowing they should stop.
In the best case, the process takes 30 seconds to shut down waiting for goroutines to unblock naturally. In the worst case, it never terminates and has to be killed, potentially leaving in-flight work in an inconsistent state.
The correct structure: context flows in, errors flow out
The pattern is simple. Every function that does I/O or blocks takes a context as its first parameter:
func runWorkers(ctx context.Context, jobs <-chan Job) error {
var wg sync.WaitGroup
errs := make(chan error, 10)
for i := 0; i < 10; i++ {
wg.Add(1)
go func() {
defer wg.Done()
for {
select {
case <-ctx.Done():
return // context cancelled, stop cleanly
case job, ok := <-jobs:
if !ok {
return // channel closed, no more jobs
}
if err := processJob(ctx, job); err != nil {
select {
case errs <- err:
default:
}
}
}
}
}()
}
wg.Wait()
close(errs)
// Return first error if any
return <-errs
}
func processJob(ctx context.Context, job Job) error {
result, err := fetchFromDatabase(ctx, job.ID)
if err != nil {
return err
}
return sendToDownstream(ctx, result)
}
Now when the context is cancelled — whether by timeout, operator signal, or parent cancellation — each worker’s select unblocks on ctx.Done() and returns cleanly. fetchFromDatabase and sendToDownstream propagate the context to their underlying I/O calls (database drivers, HTTP clients), which respect it.
Wiring context to OS signals for clean shutdown
The context that flows into your worker pool should ultimately be rooted at main, wired to OS signals:
func main() {
ctx, stop := signal.NotifyContext(context.Background(), os.Interrupt, syscall.SIGTERM)
defer stop()
jobs := make(chan Job, 100)
// Producer
go func() {
defer close(jobs)
if err := loadJobs(ctx, jobs); err != nil {
log.Printf("producer stopped: %v", err)
}
}()
// Workers
if err := runWorkers(ctx, jobs); err != nil {
log.Printf("workers stopped with error: %v", err)
os.Exit(1)
}
log.Println("shutdown complete")
}
signal.NotifyContext (added in Go 1.16) returns a context that’s cancelled when SIGINT or SIGTERM arrives. That cancellation propagates through loadJobs and runWorkers automatically. When the signal arrives, the producer stops generating new jobs, in-flight I/O observes cancellation through the propagated context, workers exit, and the process shuts down cleanly.
Timeouts: per-job vs. pool-wide
There are two distinct timeout requirements in a worker pool, and they need separate contexts:
Pool-wide timeout: how long the entire batch is allowed to run. Use context.WithTimeout or context.WithDeadline on the root context.
Per-job timeout: how long a single job is allowed to take. Use a derived context inside the worker loop.
func runWorkers(ctx context.Context, jobs <-chan Job) error {
// ctx already carries the pool-wide deadline from the caller
var wg sync.WaitGroup
for i := 0; i < 10; i++ {
wg.Add(1)
go func() {
defer wg.Done()
for {
select {
case <-ctx.Done():
return
case job, ok := <-jobs:
if !ok {
return
}
// Per-job timeout: 30 seconds max per job
jobCtx, cancel := context.WithTimeout(ctx, 30*time.Second)
err := processJob(jobCtx, job)
cancel() // always call cancel to release resources
if err != nil {
log.Printf("job %s failed: %v", job.ID, err)
}
}
}
}()
}
wg.Wait()
return ctx.Err()
}
Two things to note. First, jobCtx is derived from ctx — if the pool-wide context is cancelled, the per-job context is also cancelled automatically. The hierarchy works in your favor. Second, cancel() is called immediately after processJob returns, not deferred. Deferring a context cancel inside a loop means you accumulate uncancelled contexts until the goroutine exits. Call cancel immediately after the work is done.
Checking context in CPU-bound work
The ctx.Done() pattern works for I/O that blocks, because the underlying library propagates the context. But what about CPU-bound work — a job that does heavy computation without any I/O?
func processJob(ctx context.Context, job Job) error {
chunks := split(job.Data)
for i, chunk := range chunks {
// Check context periodically in long-running CPU work
if i%100 == 0 {
select {
case <-ctx.Done():
return ctx.Err()
default:
}
}
if err := processChunk(chunk); err != nil {
return err
}
}
return nil
}
The select with a default branch is non-blocking — it checks whether the context is done and moves on immediately if not. Checking every 100 iterations (or every N milliseconds using a ticker) is a reasonable balance between responsiveness and overhead.
The three rules
1. Context is always the first parameter.
func doWork(ctx context.Context, ...) error
Not a struct field, not a global, not injected via closure. First parameter, every time. This is a Go convention strong enough to be in the standard library style guide.
2. Never store context in a struct.
// ❌
type Worker struct {
ctx context.Context
}
// ✅
func (w *Worker) Process(ctx context.Context, job Job) error
Contexts are request-scoped. A struct outlives any single request. Storing context in a struct means you’re using the wrong context for future requests, or holding a cancelled context longer than you should.
3. Always call cancel.
ctx, cancel := context.WithTimeout(parent, 5*time.Second)
defer cancel() // even if the timeout fires first
If the timeout fires before you call cancel, calling cancel is a no-op. But if the work finishes before the timeout, calling cancel releases the timer resources immediately rather than waiting for the timeout to expire. It costs nothing and prevents a resource leak.
Summary
Clean shutdown on SIGTERM — signal.NotifyContext at main, propagate down.
Stop workers on cancellation — select { case <-ctx.Done(): return } in the worker loop.
Per-job timeout — context.WithTimeout(ctx, duration) inside the loop, cancel immediately after.
CPU-bound work — periodic select { case <-ctx.Done(): return ctx.Err(); default: }.
Function signatures — ctx context.Context is always the first parameter.
Context propagation is not boilerplate. It is the mechanism that lets your service behave correctly under pressure — when deadlines pass, when clients disconnect, when operators need to restart the service without waiting for goroutines to unblock naturally. The cost of adding it from the start is a few extra parameters. The cost of retrofitting it later is much higher.
Previous in this series: Three Go Concurrency Mistakes I See in Almost Every Worker Pool
Next: Profiling a Go service in production with pprof.
Top comments (0)