A/B Test Significance Calculator

Conversion uplift & p-value

Enter visitors and conversions for variants A (control) and B (treatment) to estimate whether B is a statistically significant winner. This tool uses the classic two-sample z-test for proportions and reports the z-score, p-value, uplift and a 95% confidence interval.

two-sample z-test pooled variance uplift & confidence interval one- or two-sided basic SRM warning

Experiment inputs

Variant A (control)

Variant B (treatment)

Default 0.05 (5% significance).

Test type

Choose one-sided only if pre-specified in your test plan.

Assumed traffic split

Used only for a quick sample ratio mismatch warning.

How this A/B test significance calculator works

We model each visitor as an independent Bernoulli trial: each user either converts (success) or does not convert (failure). For each variant we observe \[ \hat{p}_A = \frac{c_A}{n_A}, \quad \hat{p}_B = \frac{c_B}{n_B}, \] where \(n\) is the number of visitors and \(c\) the number of conversions.

Under the null hypothesis of no difference (\(H_0: p_A = p_B\)), we estimate a common conversion rate using the pooled estimator \[ \hat{p} = \frac{c_A + c_B}{n_A + n_B}, \] and use it to compute the standard error of the difference: \[ \text{SE}(\hat{p}_B - \hat{p}_A) = \sqrt{\hat{p} (1 - \hat{p}) \left(\frac{1}{n_A} + \frac{1}{n_B}\right)}. \]

The test statistic is the z-score \[ z = \frac{\hat{p}_B - \hat{p}_A}{\text{SE}(\hat{p}_B - \hat{p}_A)}. \] Assuming the normal approximation is adequate, we compute p-values from the standard normal distribution for either a two-sided or one-sided alternative.

When is variant B “statistically significant”?

For a chosen significance level \(\alpha\) (commonly 0.05), we:

  • Compute the p-value based on the z-score and test type.
  • Declare the result statistically significant if p-value \(\le \alpha\).
  • Highlight whether B looks like a winner, loser, or whether the result is inconclusive.

Remember: significance is about evidence against the “no difference” model, not a guarantee of future performance.

Uplift and 95% confidence interval

We report both the absolute uplift \[ \Delta = \hat{p}_B - \hat{p}_A, \] and the relative uplift \[ \text{uplift}_\% = 100 \times \frac{\hat{p}_B - \hat{p}_A}{\hat{p}_A}, \] whenever \(\hat{p}_A > 0\).

A simple 95% confidence interval for the absolute uplift is \[ \Delta \pm 1.96 \times \text{SE}(\Delta), \] using the same standard error formula as in the z-test. This interval is approximate but widely used in practice for large samples.

Sample ratio mismatch (SRM) warning

When you choose the “roughly 50 / 50” option, the calculator checks whether the observed traffic split is too far from 50 / 50. A large discrepancy (for example, A receiving 62% of traffic instead of the intended 50%) can be a sign of:

  • implementation bugs in the assignment logic,
  • targeting conditions that differ between variants, or
  • broken tracking or logging.

A simple threshold-based warning cannot replace a full SRM test, but it can highlight experiments that deserve a closer look before trusting the results.

Best practices for A/B testing

  • Define hypotheses, metrics, and decision rules before launching the test.
  • Keep random assignment clean; avoid overlapping experiments on the same users if possible.
  • Use fixed sample sizes or proper sequential methods instead of ad-hoc peeking.
  • Look at both statistical significance and business impact (uplift × volume).
  • Report uncertainty: p-values together with confidence intervals, not one without the other.

Related experiment & statistics tools