Understanding Your Results
ProofPod reports several statistical measures on the test detail page. Here's what each one means in plain language.
Treatment effect
The estimated impact of your experiment, usually expressed as a percentage. If the treatment effect is +5%, your test locations outperformed control locations by 5% during the test period, after accounting for baseline differences.
Confidence intervals
A range around the treatment effect that reflects uncertainty. ProofPod reports both 90% and 95% confidence intervals.
How to read them: A 90% CI of [+2%, +8%] means "we're 90% confident the true effect falls between +2% and +8%." If the interval doesn't include zero, the effect is statistically meaningful at that confidence level.
As your test runs longer, confidence intervals narrow—you become more certain about the true effect. The cumulative effect chart on the test detail page shows this visually.
Bayesian probabilities
ProofPod uses Bayesian statistics to convert effect estimates into decision-relevant probabilities:
P(positive effect)
The probability your experiment helped at all (effect > 0%). Even a 60% P(positive) means ProofPod thinks it's more likely helping than hurting, but there isn't enough certainty to act.
P(exceeds target)
The probability your experiment exceeded your MDE target. This is the key number for a Scale recommendation. When it crosses 90%, ProofPod recommends rolling out.
P(harm)
The probability your experiment is actively hurting performance (effect worse than -MDE). When this crosses 90%, ProofPod recommends Kill.
Traditional "statistical significance" answers: "Would I see data this extreme by chance?" That's not the question you need answered. You need: "Is this experiment actually working?" Bayesian probabilities answer that question directly.
Baseline mean
The average value of your primary metric in the control group. This gives you context for the treatment effect. A 5% lift on a $10,000 baseline is $500; a 5% lift on a $100,000 baseline is $5,000.
Observations
The total number of data points (location-days) used in the analysis. More observations generally mean tighter confidence intervals and more reliable results.
Early stopping
ProofPod doesn't require you to wait for a predetermined end date. You can check results at any point during the test:
- Strong positive signal early → Scale recommendation, stop early and capture the win
- Strong negative signal early → Kill recommendation, stop the bleeding
- Unclear signal → Continue recommendation, keep collecting data
This flexibility is a direct benefit of the Bayesian approach. You're never "peeking" at results—every check is a valid statistical assessment.
Credible interval vs. confidence interval
You'll see both terms in your results:
- Confidence interval — from the DiD regression (frequentist)
- Credible interval — from the Bayesian posterior
In practice, they're often similar. The credible interval has a more intuitive interpretation: "there's a 90% probability the true effect is in this range."