Engineering Excellence

A/B Testing & Experiments

Intermediate

An A/B test shows different versions to different users at random and measures which one performs better. It is a careful way to answer the question "does this change actually help?" Done well, it replaces opinion with evidence. Done carelessly, it gives confident but wrong conclusions. Respect the statistics, respect the users, and never experiment with compliance or safety.

An A/B test is a controlled experiment. It splits users at random into a control group (the current experience) and one or more variants, then compares a chosen metric. It is how you test a hypothesis (see Hypothesis-Driven Development) with real behaviour instead of guesses. But it is easy to get wrong. Common mistakes are samples that are too small, checking results and stopping early, testing many things until one looks significant, or measuring the wrong outcome.

The essentials are: define the metric and sample size before you start, assign users at random, run the test long enough, and read the results honestly. In our regulated setting, experiments must respect privacy and consent. They must never run on controls that protect customers, such as KYC, screening, or fairness.

Design experiments rigorously

Interpret and run honestly

Checking early on a tiny sample // check the dashboard hourly; variant B is ahead after 80 users
// declare B the winner, ship it

Stopping early on a tiny sample the moment it looks good only "confirms" a result that is really just noise. The conclusion is probably wrong, and you ship a change that does not actually help.

Defined up front, big enough, honest // metric: onboarding completion. Pre-computed sample ~5k/variant.
// run 2 weeks via flag, stable assignment, guardrails on errors/latency.
// analyse once at the end; ship only if the lift is real.

The design is fixed in advance, the sample is big enough, and guardrails catch side effects. The decision rests on a sound result: evidence, not noise.

Self-review checklist

Why it matters: A/B testing is the best way to know whether a change truly helps. But poor practice gives confident, false conclusions that send the product in the wrong direction. Careful design, honest reading of results, and a firm rule against experimenting on safety or compliance controls let us make evidence-based decisions we can trust and defend.