# 🛠️ SKILL — AI Alerting Mastery

## Core Taxonomies

### AI Failure Mode Categories (Aegis Taxonomy v4)

1. **Semantic Drift & Hallucination**
   - Factuality errors against retrieved context (RAG)
   - Unsubstantiated claims in open generation
   - Inconsistent reasoning chains

2. **Behavioral Regression**
   - Refusal rate changes
   - Style/tone drift (suddenly too verbose or curt)
   - Capability degradation after fine-tune or prompt change

3. **Safety & Policy Violations**
   - Successful jailbreaks or prompt injections
   - Toxic / biased / harmful outputs
   - PII or secret leakage

4. **Performance & Cost Anomalies**
   - Token usage explosions (infinite loops, bad tool calling)
   - Latency tail inflation (complex queries, retrieval bloat)
   - Hit rate collapse on vector DB or cache

5. **Data & Environment Shifts**
   - Input distribution change (new user population, new language, new domain)
   - Upstream data source corruption or staleness
   - Dependency version incompatibilities

### The AI Golden Signals (Extended)

Traditional (Latency, Traffic, Errors, Saturation) + AI-specific:
- **Grounding Score** (avg, p5)
- **Faithfulness / Hallucination Rate** (via LLM judge + citation verification)
- **Answer Relevance** (user satisfaction proxy)
- **Safety Violation Rate**
- **Cost per Successful Task**
- **Recovery / Self-Correction Rate**
- **Context Window Utilization Efficiency**

## Detection & Alerting Techniques

### Tiered Detection Strategy

**Tier 0 — Instant Guardrails (synchronous, <50ms)**
- Regex + lightweight classifiers for PII, toxicity, injection patterns
- Length and format validators
- Prompt injection detectors (e.g., NeMo Guardrails, Llama Guard)

**Tier 1 — Fast Online Signals (synchronous or near-real-time)**
- Self-consistency checks (multiple samples)
- Embedding drift vs. baseline distribution (cosine, MMD)
- Simple statistical process control on key metrics

**Tier 2 — Deep Asynchronous Analysis (minutes to hours)**
- Full RAGAS / ARES / DeepEval / custom G-Eval suites on sampled traffic
- Clustering of "surprising" outputs
- Comparison against golden test sets + production shadow traffic

**Tier 3 — Human + Aggregate Review (daily/weekly)**
- Trend analysis, cohort slicing, incident correlation

### Recommended Tooling Ecosystem

- **Instrumentation**: OpenTelemetry + custom spans for `llm.request`, `retrieval`, `judge.score`
- **Evaluation**: LangSmith, Phoenix, Helicone, custom Prometheus exporters + PostgreSQL for long-term
- **Anomaly**: WhyLogs, NannyML, custom statistical services, or simple moving quantiles in ClickHouse
- **Alerting**: Prometheus + Alertmanager, or Datadog / New Relic with strong tagging strategy; PagerDuty for escalation
- **Incident Management**: Blameless or custom Notion/Airtable runbooks linked from alerts

## Key Formulas & Heuristics

- **Error Budget Consumption Rate** for AI SLOs: Track how quickly the monthly budget of "unacceptable answers" is being burned.
- **Alert Precision Target**: > 85% for P1/P2 alerts. Ruthlessly iterate until achieved.
- **Time-to-Detect (TTD) Budget**: < 5 minutes for high-impact failure modes.
- **False Positive Budget**: < 1 actionable false positive per on-call shift per service.

## Continuous Improvement Loop

1. Alert fires → On-call investigates using provided context
2. Resolution includes structured post-mortem fields: "Was this a true positive?", "Could detection have been earlier?", "Should threshold change?"
3. Weekly automated review of all alert firing history + suggested deprecations or tightenings
4. Quarterly "Alerting Game Days" where synthetic failures are injected to test coverage.