Operations

Observability & Logging Hygiene

Foundational

You cannot operate a system you cannot see. Observability is the difference between knowing the system is healthy and only hoping it is. It also turns a five-hour diagnosis into a five-minute one. But logs and telemetry are a data store too. In a regulated business, they must never leak sensitive data.

Good observability means that when something goes wrong, you can answer what, where, and why from the telemetry alone. You should not need to attach a debugger or ask a customer to reproduce the problem. This comes from three pillars working together: structured logs (events with context), metrics (numbers that show health and trends), and traces (the path of a request across services).

There are two sides to balance. Log enough to diagnose problems and to satisfy audit. But never log secrets, credentials, full PII, or special-category data. The Finperiti context makes this clear: a log line with a passport number or a token is both a GDPR breach and a security hole. By default, make telemetry rich in context and free of sensitive data.

Make the system observable

DoEmit structured logs (key/value, not prose) with consistent fields: timestamp, level, correlation/trace id, tenant, and the event's context.
DoPass a correlation/trace id across services and async work, so you can follow one request from start to finish.
DoTrack the metrics that matter: latency, error rate, throughput, and saturation. Alert on symptoms users feel, not just raw resource numbers.
DoMake every failure observable. Log it with enough context to act, and record security- and compliance-relevant events on purpose.
ConsiderDistributed tracing and clear log levels, so the signal is not lost in noise and the detail you need is there.
AlwaysInclude a correlation id and the tenant on log events for tenant-scoped operations, so issues can be traced and isolated.

Keep telemetry clean and safe

DoLog references, not payloads: a customer id, not their name and document; an order id, not the card details.
DoRedact or mask sensitive fields by default. Treat logs, metrics, and traces as a data store with its own access control and retention.
ConsiderA shared logging helper that enforces redaction and structure, so the safe way is also the easy way.
Do notDump whole request/response bodies, exception objects, or entities into logs without checking what sensitive data comes with them.
NeverLog secrets, credentials, tokens, full PII, or special-category data.
NeverReturn internal details (stack traces, SQL, secrets, file paths) to an external caller. Diagnostics go to logs, not to the response.

Logging the whole payload log.Info($"Onboarding request: {JsonSerializer.Serialize(request)}");

Serialising the whole request dumps the name, date of birth, document numbers, and maybe a token straight into the logs. That is a GDPR breach and a credential leak in one line.

Structured, referenced, redacted

log.Info("Onboarding started {CustomerId} {TenantId} {CorrelationId}",
  customerId, tenantId, correlationId);

The event is fully traceable and easy to query, scoped to its tenant, and contains no sensitive data. You can diagnose it without putting data at risk.

Self-review checklist

AskIf this failed in production at 3 a.m., could I diagnose it from the telemetry alone?
AskIs there any secret, token, full PII, or special-category data in what I am logging?
AskCan I follow this request across services with a correlation id, scoped to its tenant?
AskDo my alerts fire on what users feel, or only on raw resource numbers?

Why it matters: Observability turns an outage from a guessing game into a diagnosis. Good telemetry is the basis of both reliability and incident response. But careless logging is one of the most common ways sensitive data leaks. The goal is telemetry that is rich in context and completely free of secrets and personal data.