## templates/postmortem-report.md

# AI Postmortem Report Template

**Incident ID**: [e.g., AI-INC-2026-0147]
**Title**: [Clear, specific, non-blaming title — e.g., 'Customer Support Agent Recommended Competitor Product After RAG Ranking Degradation Under Sustained Load']
**Severity**: P0 | P1 | P2
**Incident Window**: YYYY-MM-DD HH:MM — YYYY-MM-DD HH:MM (UTC)
**Report Date**: YYYY-MM-DD
**Postmortem Lead**: Kairos (Principal AI Postmortem Lead)
**Core Participants**: [roles and teams, not individual names unless essential]
**Status**: Draft | In Review | Final

---

## 1. Executive Summary

[Write last. 150–250 words. What happened, why it mattered to users and the business, the two or three most important systemic insights, and the highest-leverage actions the organization will take. This section must stand completely alone.]

## 2. Incident Impact

**Quantitative Impact**
- Users / customers directly affected:
- Duration of exposure:
- Measured outcomes (churn, support volume, revenue, regulatory exposure, brand sentiment shift, model performance deltas, safety or harm metrics):

**Qualitative Impact**
- User experience during the incident (verbatim examples if appropriate and sanitized)
- Internal team and organizational effects (morale, trust, velocity, cross-team friction)

## 3. Detailed Timeline

Strict chronological reconstruction. Every entry must contain:
- Timestamp (or best-estimate window)
- Event description
- Actor or system component
- Primary evidence source
- Why the event mattered at the time

Organize into clear phases: Pre-Incident Conditions | Onset & Propagation | Detection | Containment | Resolution | Recovery & Verification

## 4. Causal Analysis

### 4.1 Direct Mechanism of Harmful Behavior
[Precise technical description of how the system produced the undesired output or action.]

### 4.2 Contributing Factor Clusters

Apply at least two methods. Present each cluster with:
- Description of the factor
- 5 Whys analysis (AI-adapted, crossing technical / process / incentive layers)
- Evidence strength (High / Medium / Low)
- Supporting artifacts

Include a textual or Mermaid-described causal diagram or control structure view.

### 4.3 Why Detection, Containment, or Recovery Were Delayed or Incomplete
[Specific gaps in monitoring, alerting, runbooks, human judgment, or tooling.]

## 5. Lessons Learned

**What Worked Well**
- Detection strengths, response effectiveness, team behaviors, tooling, or processes that limited damage or accelerated learning.

**Systemic Opportunities**
- The deeper conditions this incident revealed about our AI development, deployment, or governance practices.

## 6. Recommendations

All recommendations must be SMART, owned, and explicitly linked to root causes or contributing factors.

**Immediate (0–30 days)**
- [ ] Recommendation — Owner (role) — Success Metric — Due Date — Linked Root Cause(s)

**Short-term (30–90 days)**

**Long-term / Strategic (90+ days)**

## 7. Residual Risk & Monitoring Plan

After accepted mitigations, what risk remains? What signals would indicate the same class of failure is re-emerging? Who owns ongoing monitoring and on what cadence?

## 8. Appendices

- Evidence index and links (sanitized)
- Model, prompt, and configuration versions at time of incident
- Related prior incidents or near-misses
- Glossary of terms and acronyms
- References to external research or frameworks used
- Participant comments and corrections (if any)

---

*This is a living document until marked Final. All participants are invited and expected to correct inaccuracies and strengthen analysis. The goal is truth, not consensus.*