A/B Test Significance Calculator
Conversion uplift & p-valueEnter visitors and conversions for variants A (control) and B (treatment) to estimate whether B is a statistically significant winner. This tool uses the classic two-sample z-test for proportions and reports the z-score, p-value, uplift and a 95% confidence interval.
Experiment inputs
How this A/B test significance calculator works
We model each visitor as an independent Bernoulli trial: each user either converts (success) or does not convert (failure). For each variant we observe \[ \hat{p}_A = \frac{c_A}{n_A}, \quad \hat{p}_B = \frac{c_B}{n_B}, \] where \(n\) is the number of visitors and \(c\) the number of conversions.
Under the null hypothesis of no difference (\(H_0: p_A = p_B\)), we estimate a common conversion rate using the pooled estimator \[ \hat{p} = \frac{c_A + c_B}{n_A + n_B}, \] and use it to compute the standard error of the difference: \[ \text{SE}(\hat{p}_B - \hat{p}_A) = \sqrt{\hat{p} (1 - \hat{p}) \left(\frac{1}{n_A} + \frac{1}{n_B}\right)}. \]
The test statistic is the z-score \[ z = \frac{\hat{p}_B - \hat{p}_A}{\text{SE}(\hat{p}_B - \hat{p}_A)}. \] Assuming the normal approximation is adequate, we compute p-values from the standard normal distribution for either a two-sided or one-sided alternative.
When is variant B “statistically significant”?
For a chosen significance level \(\alpha\) (commonly 0.05), we:
- Compute the p-value based on the z-score and test type.
- Declare the result statistically significant if p-value \(\le \alpha\).
- Highlight whether B looks like a winner, loser, or whether the result is inconclusive.
Remember: significance is about evidence against the “no difference” model, not a guarantee of future performance.
Uplift and 95% confidence interval
We report both the absolute uplift \[ \Delta = \hat{p}_B - \hat{p}_A, \] and the relative uplift \[ \text{uplift}_\% = 100 \times \frac{\hat{p}_B - \hat{p}_A}{\hat{p}_A}, \] whenever \(\hat{p}_A > 0\).
A simple 95% confidence interval for the absolute uplift is \[ \Delta \pm 1.96 \times \text{SE}(\Delta), \] using the same standard error formula as in the z-test. This interval is approximate but widely used in practice for large samples.
Sample ratio mismatch (SRM) warning
When you choose the “roughly 50 / 50” option, the calculator checks whether the observed traffic split is too far from 50 / 50. A large discrepancy (for example, A receiving 62% of traffic instead of the intended 50%) can be a sign of:
- implementation bugs in the assignment logic,
- targeting conditions that differ between variants, or
- broken tracking or logging.
A simple threshold-based warning cannot replace a full SRM test, but it can highlight experiments that deserve a closer look before trusting the results.
Best practices for A/B testing
- Define hypotheses, metrics, and decision rules before launching the test.
- Keep random assignment clean; avoid overlapping experiments on the same users if possible.
- Use fixed sample sizes or proper sequential methods instead of ad-hoc peeking.
- Look at both statistical significance and business impact (uplift × volume).
- Report uncertainty: p-values together with confidence intervals, not one without the other.