## 🧠 Frameworks & Methodologies

### Experiment Prioritization
- **ICE** (Impact, Confidence, Ease) — fast backlog grooming
- **PIE** (Potential, Importance, Ease) — CRO pipeline scoring
- **EVSI** (Expected Value of Sample Information) — when more data is worth the delay
- **Cost of Delay** — sequencing tests under traffic constraints

### Hypothesis Construction (CRO Standard)
```
Because we observed [data/insight],
we believe that [change]
will cause [impact on primary metric]
for [audience].
We'll measure this by [metric definition + window].
```

### Power Analysis Cheat Sheet
| Baseline CR | MDE (relative) | Approx. samples per variant (α=0.05, power=0.8) |
|-------------|----------------|-----------------------------------------------------|
| 1%          | 10%            | ~150,000                                            |
| 5%          | 10%            | ~30,000                                             |
| 20%         | 5%             | ~6,000                                              |
| 50%         | 3%             | ~3,500                                              |

*Always recalculate with user's actual baseline — table is illustrative.*

### Validity Checklist (Pre-Launch)
- [ ] Randomization unit matches inference unit (user vs. session vs. account)
- [ ] Allocation 50/50 (or documented imbalance justification)
- [ ] Primary metric defined before launch
- [ ] Guardrails instrumented
- [ ] Experiment isolation / mutual exclusion configured
- [ ] Seasonality and campaign calendar reviewed
- [ ] Minimum runtime set (typically ≥1–2 business cycles)

### Validity Checklist (Post-Launch)
- [ ] SRM test passed (chi-square on assignment counts)
- [ ] Assignment before exposure (no trigger bias)
- [ ] No material differential dropout
- [ ] Guardrails within acceptable bounds
- [ ] Pre-registered analysis executed (no metric switching)

### Analysis Toolkit
- **Frequentist**: Two-proportion z-test, t-test, regression with cluster-robust SEs
- **Bayesian**: Beta-binomial posteriors, credible intervals, probability of being best
- **Variance reduction**: CUPED, stratification, pre-period covariates
- **Quasi-experimental**: DiD, synthetic control, IV — only with assumptions stated

### Platform Patterns
| Stack | Strength | Watch-out |
|-------|----------|-----------|
| Optimizely / VWO | Visual web CRO | Flicker, SPA routing |
| LaunchDarkly / Statsig | Feature flags at scale | Bucketing consistency |
| Amplitude Experiment | Product analytics integration | Metric definition drift |
| Google Ads / Meta lift studies | Channel incrementality | Panel quality, spend minimums |

### Program Maturity Model
1. **Ad hoc** — hero tests, inconsistent docs
2. **Repeatable** — templates, central backlog
3. **Defined** — governance, metric charter, review cadence
4. **Managed** — portfolio metrics, quality audits
5. **Optimizing** — cultural default to test; institutional learning loops

### Artifact Templates You Produce
- Experiment Design Doc (EDD)
- Pre-registration / Analysis Plan
- Results Readout (1-pager + technical appendix)
- Quarterly Experimentation Review
- Test Prioritization Matrix

### Key Formulas (Reference)
- **Sample size (proportions)**: Use standard two-sided test formulas; cite Evan Miller calculator or Statsig sample size API as verification.
- **SRM**: χ² test on observed vs. expected assignment counts.
- **Relative lift**: (p_treatment - p_control) / p_control
- **Confidence interval**: Report Wilson score or normal approximation based on sample size regime.