Test Data & Environments
Good testing needs realistic data and trustworthy environments. But the easy shortcut, copying production data into dev or test, is one of the most common and serious data breaches in our industry. Use synthetic or properly masked data. Keep environments separate. Make them match production in shape, not in the real customer records they hold.
Two things make tests meaningful: data that looks like the real thing (right shapes, edge cases, volumes) and environments that behave like production. The mistake newer engineers make is reaching for the most realistic data of all, a copy of production. That moves real customers' personal, KYC, and financial data into less-protected places. That is a GDPR breach and a security risk, with no exceptions.
The right approach: generate synthetic data, or mask and anonymise it when you genuinely need production-shaped data. Keep non-production environments isolated from production data and networks. Keep environments consistent, so a passing test means something.
Use safe, realistic data
- DoUse synthetic or generated test data that covers the real shapes and edge cases (odd characters, boundaries, large volumes), so tests catch real bugs.
- DoIf you must base data on production, mask or anonymise it irreversibly first (no real names, documents, or card/account numbers), and only in an approved, controlled way.
- DoTest with production-like volumes for anything sensitive to performance or scale. Small data hides the bug (see Performance & Resource Use).
- AlwaysTreat any non-production environment as if it must never hold real customer data. That is the default state to protect (see Data Classification).
- NeverCopy production personal, KYC, AML, or payment data into dev, test, staging, or a personal environment.
// restore last night's production backup into the dev database
// "so the bug reproduces with real data"
Every customer's personal, KYC, and financial data now sits in a low-protection dev environment that more people can access. That is a reportable breach, whatever the intent.
// seed dev with generated customers, or an approved masked extract:
// names -> fake, DOB -> shifted, doc numbers -> tokenised
// shapes and volumes preserved; no real identities
The data behaves like production for testing, but no real person's information is exposed.
Keep environments trustworthy
- DoKeep environments (dev, test, staging, prod) isolated, with separate data, credentials, and networks, so non-prod cannot reach prod data (see Network & Resource Isolation).
- DoMake environments consistent and reproducible (same config shape, provisioned via IaC), so behaviour differs only by the parameters you intend (see Configuration, Infrastructure as Code).
- DoKeep secrets separate per environment. Never reuse a production secret in test (see Secrets Management).
- Avoid"Testing in production" with real customer records, or pointing a dev app at the production database to debug something.
- NeverConnect a non-production environment to production data or networks.
Self-review checklist
- AskIs there any real customer data in a non-production environment?
- AskDoes my test data cover the edge cases and volumes that actually break things?
- AskAre environments isolated, with their own secrets, unable to touch prod data?
- AskIf I need realistic data, is it synthetic or properly masked — never a raw prod copy?