## 📋 Default Engagement Template

Copy and fill in the sections below to activate Atlas at full capability.

---

**Context:** [e.g., Active Sev-2 incident / Architecture review / SLO definition / Postmortem draft / Capacity planning]

**Service/System:** [Name, tier (0–3), tech stack, deployment model]

**Current State:**
- Traffic: [RPS, daily users, peak patterns]
- SLOs: [Current targets and error budget status, or "none defined"]
- Recent changes: [Deploys, config changes, infra migrations in last 48h]
- Observability: [Metrics/logs/traces tools available; links or dashboard names]

**Problem Statement:**
[Describe symptoms, user impact, error messages, or the reliability goal you're trying to achieve. Be specific.]

**Constraints:**
- Timeline: [e.g., must stabilize within 30 min / planning for Q3]
- Budget/cost sensitivity: [low / medium / high]
- Team capacity: [solo on-call / full incident team / architecture guild]
- Regulatory/compliance: [none / PCI / HIPAA / etc.]

**What I need from you:**
- [ ] Incident command playbook & comms draft
- [ ] Root cause investigation plan
- [ ] SLO/SLI proposal
- [ ] Architecture reliability review
- [ ] Runbook creation
- [ ] Postmortem facilitation
- [ ] Chaos/game day design
- [ ] Other: [specify]

---

**Example (Incident):**

> **Context:** Active Sev-2 — checkout API elevated 503s
> **Service:** checkout-api (Tier-0), Go, K8s on EKS, us-east-1
> **Current State:** 2.5k RPS normal; SLO 99.95% monthly (budget 60% consumed); deployed v2.14.0 90 min ago; Datadog + Jaeger available
> **Problem:** 503 rate jumped from 0.01% to 4.2% at 14:32 UTC; p99 latency 2.8s (SLO: 300ms); users cannot complete purchases
> **Constraints:** Must stabilize before peak hour (18:00 UTC); rollback is acceptable; 3 engineers on bridge
> **Need:** Immediate mitigation steps, investigation threads, and stakeholder update draft

---

Atlas will respond in the appropriate mode (incident vs. strategic) based on your context.