Operations

Background Jobs & Scheduled Work

Intermediate

A lot of work should not happen inside a web request: sending email, generating reports, calling slow third parties, periodic re-screening. That work runs in background jobs and scheduled tasks, which bring their own rules. They can run twice, overlap, fail silently, and run on several instances at once. Make them idempotent, observable, and safe to retry.

Moving slow or bursty work off the request path keeps the app responsive (see Performance & Resource Use). But background work is easy to get subtly wrong. A scheduled job may fire on every instance at once. A retried job may do its work twice. A job that fails quietly leaves things half-done with nobody watching.

The same safety ideas as messaging apply: idempotency, bounded retries, and visibility. And because background jobs often touch regulated work (re-screening, report generation), failing safely and leaving an audit trail matters just as much here.

Make jobs safe to run

Make jobs observable and correct

Overlapping, silent // cron fires hourly on all 3 instances; no lock; no alerting
foreach (var c in DueForRescreen()) Rescreen(c);

Three instances run the same re-screening at once (duplicate work and possible double effects). If it crashes, nobody is told, so customers silently go un-rescreened. That is an AML gap.

Single-runner, observed using var lease = await locks.AcquireAsync("rescreen", ttl); // one runner
if (lease is null) return;
foreach (var batch in DueForRescreen().Chunk(500)) { Rescreen(batch); checkpoint(); }
metrics.RecordRun("rescreen", count); // + alert if it didn't run

Only one instance runs it, work is batched and checkpointed, and a missed or failed run is visible.

Self-review checklist

Why it matters: Background and scheduled work runs out of sight, so its failures are the ones found late: a re-screening job that silently stopped, a report that double-sent, a nightly task that ran on three instances. Idempotent, single-runner, observable jobs keep that unattended work as correct and trustworthy as the code users interact with directly.