# Aegis AI Production Readiness Checklist

This living checklist is used for every production AI launch or major update. All P0 items must be green (or formally risk-accepted) before launch.

## 1. Data Foundation (P0)
- [ ] Complete data provenance and lineage documented and version-controlled
- [ ] Training / validation / test splits are statistically independent with no leakage
- [ ] Data quality gates and automated validation in the ingestion pipeline
- [ ] Distribution monitoring and drift detection on all critical input features and embeddings
- [ ] Sensitive attribute collection and documentation sufficient for fairness slicing
- [ ] Data poisoning and label corruption detection mechanisms in place

## 2. Evaluation & Testing (P0)
- [ ] Golden evaluation set(s) representative of production traffic, versioned and refreshed on a defined cadence
- [ ] Multi-dimensional evaluation covering accuracy, robustness, calibration, OOD, and adversarial resistance
- [ ] Slice-based analysis for fairness and performance on critical subpopulations
- [ ] Automated regression detection in CI/CD for every model or prompt change
- [ ] Human preference or expert judgment evaluation correlated with automated metrics
- [ ] Red-team / adversarial evaluation harness executed and results documented

## 3. Model & System Guardrails (P0)
- [ ] Output validation and factual verification layer for all high-stakes generations
- [ ] Prompt injection and jailbreak resistance testing completed with acceptable residual rate
- [ ] Tool-use / function-calling correctness verification for agentic systems
- [ ] Retrieval faithfulness and citation accuracy checks for RAG systems
- [ ] Automated PII and disallowed content filters with measured false-positive/false-negative rates
- [ ] Graceful degradation and fallback behavior defined and tested

## 4. Observability & Monitoring (P0)
- [ ] Real-time metrics for latency, cost, error rates, and AI-specific signals (hallucination rate, drift scores, uncertainty estimates)
- [ ] Statistical process control and change-point detection on key distributions
- [ ] Automated alerting with clear ownership and escalation paths
- [ ] Shadow or canary deployment capability with statistical comparison to baseline
- [ ] Full request/response logging with sampling strategy and retention policy compliant with privacy requirements

## 5. Incident Response & Rollback (P0)
- [ ] Documented AI-specific incident response playbook (model vs. data vs. prompt vs. infra vs. human-process hypotheses)
- [ ] Automated circuit breakers or traffic shedding based on reliability signals
- [ ] One-click or automated rollback to previous known-good model/prompt/config version
- [ ] Postmortem template and blameless review process defined and practiced

## 6. Governance & Documentation (P0)
- [ ] Up-to-date model card or system card with known limitations and intended use
- [ ] Explicit risk tier classification and sign-off by appropriate business and compliance owners
- [ ] Audit trail completeness for training data, evaluations, and production decisions
- [ ] Regulatory classification assessment (high-risk vs. limited/minimal risk) completed

## 7. Human & Process Layer (P1)
- [ ] Clear human oversight and override design with measured override latency and quality
- [ ] Feedback loop integrity protections (RLHF data poisoning prevention)
- [ ] Training for human reviewers and escalation operators
- [ ] Defined criteria for when to involve the Head of AI Reliability (Aegis) before changes

## 8. Continuous Improvement (P1)
- [ ] Cadence for golden set refresh and eval evolution
- [ ] Scheduled chaos / red-team exercises
- [ ] Quarterly reliability review with leadership and error-budget accounting

**Sign-off Required:** Reliability Owner (Aegis review) + Product Owner + Engineering Owner + Compliance (when applicable).

This checklist is updated after every significant incident or new research finding.