# [P1] AI System Postmortem Report: [Concise Factual Title]

**Incident ID**: 
**Lead Facilitator**: Aether (Principal AI Postmortem Lead)
**Date Incident Occurred**: 
**Date Postmortem Opened**: 
**Date Report Published**: 
**Participants**: [Engineers, PMs, stakeholders who contributed meaningfully]

---

## Executive Summary

[One tight paragraph describing what broke, the duration, and the highest-leverage impact.]

**Business & Technical Impact** (key quantified effects):
- Customers affected: 
- Revenue / contractual risk: 
- Model quality regression: 
- Operational cost: 
- Regulatory / compliance exposure: 
- Trust / brand impact: 

**Root Cause (single sentence)**: 

**Status**: All P0/P1 actions have owners and verification methods. Residual risk explicitly accepted by [Name/Role]. 30/60/90-day reviews scheduled.

---

## Impact Assessment

| Dimension              | Quantified Impact                     | Qualitative Impact                              | Notes / Evidence |
|------------------------|---------------------------------------|-------------------------------------------------|------------------|
| Customer Experience    | 340 conversations, 3 enterprise logos | Written incorrect compliance advice delivered   | Screenshots + logs |
| Revenue                | $XXk direct, $YYk at-risk renewal     | Executive escalation and trust repair required  | Account team notes |
| Model Quality          | Hallucination rate +340% on topic     | Degradation on time-sensitive regulatory queries| Eval vs. prod comparison |
| Operational            | 12 person-hours investigation + triage| On-call load and customer support surge         | Incident timeline |
| Regulatory / Compliance| Potential misstatement of regulation  | Possible audit or customer regulatory filing    | Legal review pending |
| Reputational / Trust   | Measurable drop in NPS for cohort     | 'How did your AI get this so wrong?' sentiment  | Verbatim feedback |

---

## Timeline of Events

| Timestamp (UTC) | Actor / Component              | Event / Observation                                                                 | Evidence Source             | Observed Effect                  | Confidence |
|-----------------|--------------------------------|-------------------------------------------------------------------------------------|-----------------------------|----------------------------------|------------|
| 2025-05-12 09:14 | Retrieval Indexer             | New chunking strategy deployed for compliance-v2 corpus                            | Git commit abc123def        | Subtle change in chunk boundaries | High       |
| 2025-05-12 09:18 | Embedding Model               | Same index rebuild completed                                                       | Telemetry                   | Minor embedding drift on legal docs | Medium     |
| 2025-05-12 14:22 | Inference Service             | First failing compliance query served with new retrieval logic                     | Request logs + prompt trace | Incorrect but authoritative answer | High       |
| ...             | ...                           | ...                                                                                 | ...                         | ...                              | ...        |

(Include detection, escalation, containment, and rollback events with equal precision.)

---

## Root Cause Analysis

### What We Observed (Symptom Chain)

### Why the Failure Was Possible (Proximate Causes)

### Why Our Defenses Failed (Root Causes & Latent Conditions)

**Primary Root Cause**: The combination of [specific condition A] and [specific condition B] allowed the model to produce [specific harmful behavior] with no downstream verification or freshness check being triggered.

**Contributing Factors** (ranked by leverage):
1. Evaluation suite contained zero test cases exercising time-sensitive regulatory factuality after corpus updates.
2. Prompt assembly logic did not inject source effective dates or require the model to reason about recency on compliance topics.
3. The recent chunking change improved average retrieval metrics while silently harming tail regulatory queries.
4. No automated drift or freshness monitor existed for the compliance corpus.

**Analysis Methods Applied**: 5 Whys, Fault Tree Analysis, Change Analysis, Barrier Analysis (which guardrails were absent or ineffective).

---

## Lessons Learned

### What Went Well
- Rapid customer notification once the issue was surfaced
- Clear ownership once incident was declared
- ...

### Opportunities for Improvement
- Our offline eval distribution was blind to time-sensitive regulatory queries after corpus refreshes.
- The prompt layer had no freshness or source-date discipline on high-stakes topics.
- Observability did not surface retrieval quality degradation on the affected corpus.
- ...

---

## Action Items

| ID  | Action (SMART)                                                                 | Type          | Owner              | Due Date   | Verification Method                                      | Status |
|-----|--------------------------------------------------------------------------------|---------------|--------------------|------------|----------------------------------------------------------|--------|
| A01 | Add mandatory 'effective_date' metadata to all compliance chunks and require prompt to check recency before authoritative answers | Structural    | @ml-platform       | 2025-05-20 | New eval case passes + 7-day canary with zero failures   | Open   |
| A02 | Implement secondary LLM-as-judge + output parser for all regulatory/compliance topics | Technical     | @safety-team       | 2025-05-27 | 100% of compliance answers pass judge in production      | Open   |
| A03 | Update on-call runbook with 'regulatory hallucination' triage checklist and corpus freshness check | Process       | @ops-lead          | 2025-05-15 | Runbook updated, team trained, checklist used in drill   | Open   |
| A04 | Run structured premortem on next major compliance corpus refresh and any chunking strategy change | Cultural      | Aether + PM        | 2025-06-01 | Premortem document created, reviewed, and archived       | Open   |
| A05 | Add automated freshness + drift monitor for all high-stakes corpora with P1 alerting | Technical     | @observability     | 2025-06-10 | Monitor fires on synthetic drift test                    | Open   |

---

## Residual Risk & Monitoring Plan

After the above actions are verified, the following risks remain and are explicitly accepted:
- [Risk description] — Owner: [Name/Role] — Review date: 2025-08-12

**30-day effectiveness review**: [date]
**90-day cross-postmortem pattern review**: [date]

---

## Appendices

- A. Raw failing prompts, retrieved chunks, and model outputs (redacted)
- B. Full 5 Whys worksheet
- C. Fault Tree diagram (Mermaid)
- D. Related past incidents and near-misses
- E. 'How We Got Lucky' analysis
- F. Incentive and process map
- G. Eval gap analysis (which test cases would have caught this?)

---