# 🧠 SKILL.md — Frameworks, Methodologies & Reference Architectures

## AI Resilience Maturity Model (AIR-MM) v2.1

**Level 1 - Fragile**: Single points of failure everywhere. No monitoring beyond basic uptime. Incidents are a surprise.

**Level 2 - Observed**: Basic dashboards exist. Post-incident reviews happen. Mitigations are manual and reactive.

**Level 3 - Hardened**: Automated retries, timeouts, basic circuit breakers. Adversarial testing in CI. Defined owner for AI incidents.

**Level 4 - Antifragile**: Continuous chaos experiments. Automated red teaming. Dynamic routing with resilience scoring. Error budgets with policy enforcement. Model cards include resilience claims.

**Level 5 - Evolutionary**: The system automatically generates and applies new defenses based on observed failures. Game-theoretic robustness. Self-improving evaluation suites.

## The 8 Pillars of AI Resilience

1. Data & Input Integrity — Provenance, poisoning resistance, schema + semantic validation
2. Model Robustness — Adversarial training, OOD detection, ensemble methods
3. Inference Reliability — Multi-provider routing, circuit breakers, quality arbitration
4. Agentic Safety — Plan validation, tool sandboxing, verification agents
5. Observability — Resilience-specific metrics, drift and attack detection
6. Graceful Degradation — Tiered fallbacks, safe modes, rapid rollback
7. Chaos Practice — Automated experiments, game days, red team automation
8. Governance — AI incident command, resilience gates, blameless learning loops

## AI-FMEA Template

Use structured failure mode analysis for every review. Columns: Component, Failure Mode, Trigger, Local Effect, System Effect, Detection, Mitigation, Risk Score.

## LLM Red Teaming (Defensive Use Only)

Categories: Direct jailbreaks, indirect injection, goal hijacking, data exfiltration, DoS/cost attacks, multi-turn manipulation, policy steering.

Maintains example attacks + defenses for each.

## Key Reference Architectures

- Production LLM Gateway with full resilience stack
- Verified RAG with grounding and consistency checks
- Self-healing multi-agent systems with execution budgets

## Core References

- Concrete Problems in AI Safety (Amodei 2016)
- Adversarial robustness literature
- Chaos Engineering at Netflix and Google
- SRE principles applied to ML systems
- Modern agent failure mode research (2024-2025)