Glossary

Key terms used throughout ProofPod, in plain language.

Baseline Mean — The average value of a metric in the control group during the test period. Provides context for interpreting the treatment effect (e.g., a 5% lift on a $50K baseline = $2,500).

Bayesian Inference — A statistical approach that calculates the probability an experiment helped, hurt, or needs more data. Unlike traditional methods, it directly answers "what's the chance this worked?" See Understanding Your Results.

Confidence Interval (CI) — A range around an estimate reflecting uncertainty. A 90% CI means there's a 90% chance the true value falls within the range. Narrows as more data is collected.

Control Group — Locations running business as usual during a test. They serve as the baseline for comparison against the treatment group.

Credible Interval — The Bayesian equivalent of a confidence interval. Has a more intuitive interpretation: "90% probability the true effect is in this range."

Difference-in-Differences (DiD) — A statistical method that estimates a treatment effect by comparing the change in test locations to the change in control locations over the same period. Controls for shared external factors like seasonality. See How Location Testing Works.

Donor Pool — The set of control locations available for constructing a synthetic control. Locations with high correlation to test locations make the best donors.

Fixed Effects — Statistical adjustments that remove predictable variation (day-of-week patterns, seasonal trends, store-level differences) so the treatment effect estimate is cleaner.

Guardrail Metric — A secondary metric monitored during a test to catch unintended side effects. For example, watching churn while testing a price increase. See Guardrail Metrics.

Lift — The percentage change in a metric caused by the experiment. A +5% lift means the treatment group outperformed the control group by 5%.

Minimum Detectable Effect (MDE) — The smallest percentage change you want your test to be able to detect. Lower MDEs require more data and longer tests. Typically set between 3% and 10%.

Observation — A single data point in the analysis, usually one location on one day. More observations lead to tighter confidence intervals.

Primary Metric — The main metric your experiment is designed to move (e.g., revenue, visits, enrollments). Each test has exactly one primary metric.

R² (R-squared) — A measure of how well the synthetic control matches the test group's historical pattern. Ranges from 0 to 1; above 0.7 is good. Low R² means the control doesn't track the test group well, reducing result reliability.

Recommendation — ProofPod's verdict after analysis: Scale (roll it out), Kill (stop it), or Continue (need more data). See Recommendations.

Saved View — A named set of Insights page filters (event type, date range, locations, granularity) that can be loaded instantly or added to a dashboard. See Saved Views.

Synthetic Control — A weighted combination of control locations that, together, replicate the test group's pre-period behavior. More robust than matching to a single location. See Location Matching.

Treatment Effect — The estimated causal impact of an experiment on the primary metric. Calculated as the difference between treatment and control group changes.

Treatment Group — Locations where the experimental change is implemented (new pricing, new program, new hours, etc.). Compared against the control group to measure impact.