Recommendations

After analysis, ProofPod gives you a clear recommendation: Scale, Kill, or Continue. Here's how each is determined.

Scale

Meaning: Strong evidence your experiment works. Roll it out.

Criteria:

Probability of success (P(exceeds target)) is greater than 90%
No critical guardrail failures

The confidence score equals the probability of success. A "Scale" recommendation means ProofPod is highly confident that your experiment exceeded your minimum detectable effect target.

Kill

Meaning: Strong evidence your experiment is hurting. Stop it.

Criteria (either triggers Kill):

Probability of harm (P(effect < -MDE)) is greater than 90%, or
Critical data quality issues that make results unreliable

When triggered by harm, the confidence score reflects how certain ProofPod is that the experiment is causing damage. When triggered by data quality, it's a protective stop—the data can't be trusted regardless of what the numbers show.

Continue

Meaning: Not enough evidence yet. Keep running.

Criteria:

Neither Scale nor Kill thresholds are met

The confidence score reflects the current directional lean—how confident ProofPod is in the direction of the effect, even though it hasn't crossed a decision threshold yet.

Confidence checklist

Each recommendation includes a checklist of supporting evidence:

Primary metric check — is the metric moving in the right direction with sufficient probability?
Guardrail checks — are any secondary metrics showing concerning changes?
Power check — does the test have enough statistical power to detect the target effect?

Each item is marked as passing, warning, or failing.

Early stopping

ProofPod supports early stopping in both directions:

Winning early: If your experiment is clearly working before the planned end date, you'll see a Scale recommendation. You can complete the test early and capture the benefit sooner.
Losing early: If your experiment is clearly harmful, ProofPod recommends Kill immediately—no reason to keep running a losing experiment.

This is a key advantage of ProofPod's Bayesian approach. Traditional frequentist methods require you to wait for a fixed sample size. Bayesian analysis lets you check anytime and stop when the evidence is clear.

info

The 90% probability thresholds are designed to balance decisiveness with accuracy. They're high enough to avoid false positives but not so high that you're waiting forever for a call.

Scale​

Kill​

Continue​

Confidence checklist​

Early stopping​

Scale

Kill

Continue

Confidence checklist

Early stopping