Prasad Ekke

Posted on Jun 23 • Originally published at Medium

context.Context Is Not Optional: A Practical Guide to Cancellation in Go Services

#go #designpatterns #programming #backend

Every Go service that does I/O — database calls, HTTP requests, queue polling, file reads — should be passing a context.Context through every layer. In practice, a large number of codebases treat context.Context as optional plumbing that gets added later, or as something only the HTTP handler layer needs to worry about.

That’s the mistake. By the time you realize you need cancellation deep in a worker pool, retrofitting it is painful. This post shows what context propagation looks like in a real worker pool, what the failure modes are when you skip it, and how to do it correctly from the start.

What context.Context actually does

A context.Context carries three things:

A cancellation signal — a channel that’s closed when the work should stop
A deadline or timeout — an absolute time after which the context is automatically cancelled
Key-value pairs — request-scoped values like trace IDs (use sparingly)

When a context is cancelled, any blocking operation that’s watching that context should stop and return an error. The context propagates down the call stack — a parent context cancelled cancels all children. It never propagates up.

The critical point: the context has to be passed all the way down to the code that actually blocks. Holding it at the HTTP handler layer and never passing it to your database call means cancellation never reaches the work.

What happens without context

Here is a worker pool that ignores context:

func runWorkers(jobs <-chan Job) {
    var wg sync.WaitGroup
    for i := 0; i < 10; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            for job := range jobs {
                processJob(job) // no context
            }
        }()
    }
    wg.Wait()
}

func processJob(job Job) error {
    result, err := fetchFromDatabase(job.ID) // no context
    if err != nil {
        return err
    }
    return sendToDownstream(result) // no context
}

Now consider what happens when:

The calling service shuts down
A client disconnects
A deadline passes
An operator sends SIGTERM

The workers keep running. fetchFromDatabase keeps waiting. sendToDownstream keeps trying. The process can’t exit cleanly because goroutines are blocked on operations that have no way of knowing they should stop.

In the best case, the process takes 30 seconds to shut down waiting for goroutines to unblock naturally. In the worst case, it never terminates and has to be killed, potentially leaving in-flight work in an inconsistent state.

The correct structure: context flows in, errors flow out

The pattern is simple. Every function that does I/O or blocks takes a context as its first parameter:

func runWorkers(ctx context.Context, jobs <-chan Job) error {
    var wg sync.WaitGroup
    errs := make(chan error, 10)

    for i := 0; i < 10; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            for {
                select {
                case <-ctx.Done():
                    return // context cancelled, stop cleanly
                case job, ok := <-jobs:
                    if !ok {
                        return // channel closed, no more jobs
                    }
                    if err := processJob(ctx, job); err != nil {
                        select {
                        case errs <- err:
                        default:
                        }
                    }
                }
            }
        }()
    }

    wg.Wait()
    close(errs)

    // Return first error if any
    return <-errs
}

func processJob(ctx context.Context, job Job) error {
    result, err := fetchFromDatabase(ctx, job.ID)
    if err != nil {
        return err
    }
    return sendToDownstream(ctx, result)
}

Now when the context is cancelled — whether by timeout, operator signal, or parent cancellation — each worker’s select unblocks on ctx.Done() and returns cleanly. fetchFromDatabase and sendToDownstream propagate the context to their underlying I/O calls (database drivers, HTTP clients), which respect it.

Wiring context to OS signals for clean shutdown

The context that flows into your worker pool should ultimately be rooted at main, wired to OS signals:

func main() {
    ctx, stop := signal.NotifyContext(context.Background(), os.Interrupt, syscall.SIGTERM)
    defer stop()

    jobs := make(chan Job, 100)

    // Producer
    go func() {
        defer close(jobs)
        if err := loadJobs(ctx, jobs); err != nil {
            log.Printf("producer stopped: %v", err)
        }
    }()

    // Workers
    if err := runWorkers(ctx, jobs); err != nil {
        log.Printf("workers stopped with error: %v", err)
        os.Exit(1)
    }

    log.Println("shutdown complete")
}

signal.NotifyContext (added in Go 1.16) returns a context that’s cancelled when SIGINT or SIGTERM arrives. That cancellation propagates through loadJobs and runWorkers automatically. When the signal arrives, the producer stops generating new jobs, in-flight I/O observes cancellation through the propagated context, workers exit, and the process shuts down cleanly.

Timeouts: per-job vs. pool-wide

There are two distinct timeout requirements in a worker pool, and they need separate contexts:

Pool-wide timeout: how long the entire batch is allowed to run. Use context.WithTimeout or context.WithDeadline on the root context.

Per-job timeout: how long a single job is allowed to take. Use a derived context inside the worker loop.

func runWorkers(ctx context.Context, jobs <-chan Job) error {
    // ctx already carries the pool-wide deadline from the caller

    var wg sync.WaitGroup
    for i := 0; i < 10; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            for {
                select {
                case <-ctx.Done():
                    return
                case job, ok := <-jobs:
                    if !ok {
                        return
                    }

                    // Per-job timeout: 30 seconds max per job
                    jobCtx, cancel := context.WithTimeout(ctx, 30*time.Second)
                    err := processJob(jobCtx, job)
                    cancel() // always call cancel to release resources

                    if err != nil {
                        log.Printf("job %s failed: %v", job.ID, err)
                    }
                }
            }
        }()
    }

    wg.Wait()
    return ctx.Err()
}

Two things to note. First, jobCtx is derived from ctx — if the pool-wide context is cancelled, the per-job context is also cancelled automatically. The hierarchy works in your favor. Second, cancel() is called immediately after processJob returns, not deferred. Deferring a context cancel inside a loop means you accumulate uncancelled contexts until the goroutine exits. Call cancel immediately after the work is done.

Checking context in CPU-bound work

The ctx.Done() pattern works for I/O that blocks, because the underlying library propagates the context. But what about CPU-bound work — a job that does heavy computation without any I/O?

func processJob(ctx context.Context, job Job) error {
    chunks := split(job.Data)

    for i, chunk := range chunks {
        // Check context periodically in long-running CPU work
        if i%100 == 0 {
            select {
            case <-ctx.Done():
                return ctx.Err()
            default:
            }
        }

        if err := processChunk(chunk); err != nil {
            return err
        }
    }

    return nil
}

The select with a default branch is non-blocking — it checks whether the context is done and moves on immediately if not. Checking every 100 iterations (or every N milliseconds using a ticker) is a reasonable balance between responsiveness and overhead.

The three rules

1. Context is always the first parameter.

func doWork(ctx context.Context, ...) error

Not a struct field, not a global, not injected via closure. First parameter, every time. This is a Go convention strong enough to be in the standard library style guide.

2. Never store context in a struct.

// ❌
type Worker struct {
    ctx context.Context
}

// ✅
func (w *Worker) Process(ctx context.Context, job Job) error

Contexts are request-scoped. A struct outlives any single request. Storing context in a struct means you’re using the wrong context for future requests, or holding a cancelled context longer than you should.

3. Always call cancel.

ctx, cancel := context.WithTimeout(parent, 5*time.Second)
defer cancel() // even if the timeout fires first

If the timeout fires before you call cancel, calling cancel is a no-op. But if the work finishes before the timeout, calling cancel releases the timer resources immediately rather than waiting for the timeout to expire. It costs nothing and prevents a resource leak.

Summary

Clean shutdown on SIGTERM — signal.NotifyContext at main, propagate down.

Stop workers on cancellation — select { case <-ctx.Done(): return } in the worker loop.

Per-job timeout — context.WithTimeout(ctx, duration) inside the loop, cancel immediately after.

CPU-bound work — periodic select { case <-ctx.Done(): return ctx.Err(); default: }.

Function signatures — ctx context.Context is always the first parameter.

Context propagation is not boilerplate. It is the mechanism that lets your service behave correctly under pressure — when deadlines pass, when clients disconnect, when operators need to restart the service without waiting for goroutines to unblock naturally. The cost of adding it from the start is a few extra parameters. The cost of retrofitting it later is much higher.

Previous in this series: Three Go Concurrency Mistakes I See in Almost Every Worker Pool
Next: Profiling a Go service in production with pprof.

DEV Community