# Experiment Protocol Template

**Title:** [Concise, searchable name for the line of inquiry]
**Experiment ID:** PROM-[YYYYMMDD]-[NNN]
**Lead Experimenter:** [User] + Prometheus
**Date Created:** 
**Last Updated:** 
**Status:** Design | Pilot | Confirmatory | Analysis | Archived

## 1. Research Question

One crisp, answerable paragraph. What exactly are we trying to learn?

## 2. Motivation & Decision Context

Why does resolving this uncertainty matter right now? What decision, risk, or investment depends on the answer? What is the expected cost of remaining wrong or uncertain for another quarter?

## 3. Hypotheses

| ID | Hypothesis Statement | Proposed Mechanism | Falsification / Strong Support Criteria |
|----|----------------------|--------------------|---------------------------------------|
| H0 | | | |
| H1 | | | |
| H2 | | | |

## 4. Variables & Experimental Design

**Design Type:** Within-subjects contrastive / Factorial / Sequential / Other

**Independent Variables**
- Variable 1: name — levels / values — operationalization

**Dependent Variables & Metrics**
- Primary: name — exact scoring method or judge prompt reference
- Secondary / Guardrails: ...

**Controls, Blocking, and Stratification:**

**Randomization & Counterbalancing Plan:**

## 5. Sample Size & Power

Target N (prompts × items × models): 
Justification and power analysis approach:
Minimum detectable effect size we care about:

## 6. Procedure (Execution Instructions)

Step-by-step, copy-paste ready. Include exact model identifiers, temperature, top_p, system prompt, user prompt template with all variables, output parsing instructions, and logging format.

## 7. Measurement Instruments

- Automated metrics and code
- Full LLM-as-Judge prompt(s) with calibration notes
- Human evaluation rubric (if used)
- Inter-annotator or inter-judge agreement protocol

## 8. Analysis Plan

Statistical tests, visualization approach, qualitative error taxonomy, and how results will be synthesized across metrics.

## 9. Pre-registered Decision Rules

'If primary metric shows >X improvement with p<0.05 (or Bayesian equivalent) and no regression on guardrails, we will recommend [specific action scope]. If effect is between Y and X, we will run confirmatory study. If below Y, we abandon this direction for 6 months.'

## 10. Risk Register

| Category | Risk Description | Likelihood | Impact | Mitigation / Contingency |
|----------|------------------|------------|--------|--------------------------|
| Scientific | | | | |
| Operational | | | | |
| Ethical / Safety | | | | |
| Dual-Use | | | | |

## 11. Ethical & Safety Pre-Review

Completed by: 
Date: 
Key findings and required safeguards:

## 12. Budget & Timeline

Estimated total tokens / API cost:
Human hours required:
Calendar duration:

## 13. Appendices

- Complete prompt library with variable substitution table
- Data schema and storage plan
- Random seed / ordering plan
- Any external datasets or gold labels used

This document must be substantially complete and explicitly approved by the user before significant execution begins.