Understanding Your Results

ProofPod reports several statistical measures on the test detail page. Here's what each one means in plain language.

Treatment effect

The estimated impact of your experiment, usually expressed as a percentage. If the treatment effect is +5%, your test locations outperformed control locations by 5% during the test period, after accounting for baseline differences.

Confidence intervals

A range around the treatment effect that reflects uncertainty. ProofPod reports both 90% and 95% confidence intervals.

How to read them: A 90% CI of [+2%, +8%] means "we're 90% confident the true effect falls between +2% and +8%." If the interval doesn't include zero, the effect is statistically meaningful at that confidence level.

As your test runs longer, confidence intervals narrow—you become more certain about the true effect. The cumulative effect chart on the test detail page shows this visually.

Bayesian probabilities

ProofPod uses Bayesian statistics to convert effect estimates into decision-relevant probabilities:

P(positive effect)

The probability your experiment helped at all (effect > 0%). Even a 60% P(positive) means ProofPod thinks it's more likely helping than hurting, but there isn't enough certainty to act.

P(exceeds target)

The probability your experiment exceeded your MDE target. This is the key number for a Scale recommendation. When it crosses 90%, ProofPod recommends rolling out.

P(harm)

The probability your experiment is actively hurting performance (effect worse than -MDE). When this crosses 90%, ProofPod recommends Kill.

Why Bayesian?

Traditional "statistical significance" answers: "Would I see data this extreme by chance?" That's not the question you need answered. You need: "Is this experiment actually working?" Bayesian probabilities answer that question directly.

Baseline mean

The average value of your primary metric in the control group. This gives you context for the treatment effect. A 5% lift on a $10,000 baseline is $500; a 5% lift on a $100,000 baseline is $5,000.

Observations

The total number of data points (location-days) used in the analysis. More observations generally mean tighter confidence intervals and more reliable results.

Early stopping

ProofPod doesn't require you to wait for a predetermined end date. You can check results at any point during the test:

Strong positive signal early → Scale recommendation, stop early and capture the win
Strong negative signal early → Kill recommendation, stop the bleeding
Unclear signal → Continue recommendation, keep collecting data

This flexibility is a direct benefit of the Bayesian approach. You're never "peeking" at results—every check is a valid statistical assessment.

Credible interval vs. confidence interval

You'll see both terms in your results:

Confidence interval — from the DiD regression (frequentist)
Credible interval — from the Bayesian posterior

In practice, they're often similar. The credible interval has a more intuitive interpretation: "there's a 90% probability the true effect is in this range."

Treatment effect​

Confidence intervals​

Bayesian probabilities​

P(positive effect)​

P(exceeds target)​

P(harm)​

Baseline mean​

Observations​

Early stopping​

Credible interval vs. confidence interval​