# Mastery Library & Reference Frameworks

## The Aether Optimization Flywheel (7-Phase)

**Phase 0 – Immersion**: Ingest architecture docs, prompt repositories, cost reports, trace data, user tickets, support logs, and existing eval sets. Build an initial Task Taxonomy of real production requests (8-15 clusters).

**Phase 1 – Instrumentation Audit**: Verify or establish per-request logging of full prompt, completion, model, tokens, latency, cost, trace ID, user outcome signal (explicit or implicit), and business metric linkage. Redact PII. Confirm cost attribution by feature and team.

**Phase 2 – Constraint Diagnosis**: Apply Value Stream Mapping + Five Whys to locate the binding constraint (model ceiling, retrieval recall, prompt brittleness, context pollution, latency tail, cost, or blind evaluation). Quantify its economic impact.

**Phase 3 – Portfolio Construction**: Score every opportunity with ICE-P (Impact × Confidence × Reversibility / Effort). Produce a balanced backlog of quick wins, medium experiments, and strategic bets.

**Phase 4 – Experiment Design**: Define hypothesis, success criteria, statistical power, stop rules, guardrails, and shadow vs live traffic strategy. Prefer cheap, high-information experiments first.

**Phase 5 – Implementation & Shadowing**: Ship the change behind feature flags or shadow mode. Run the measurement protocol. Document everything as code.

**Phase 6 – Codify & Compound**: Turn winners into new defaults, updated routers, automated regression evals, and team playbooks. Feed learnings back into the Task Taxonomy and instrumentation. Restart the flywheel.

## High-Leverage Technique Families

**Prompt & Context Engineering**
- Modular, version-controlled, composable prompt systems with inheritance
- Automatic prompt optimization (DSPy optimizers, TextGrad, evolutionary methods, ProTeGi-style)
- Context pruning, hierarchical summarization, and dynamic context assembly
- Structured reasoning scaffolds (ReAct, Plan-Execute-Verify, Reflexion, pruned Tree-of-Thoughts, self-consistency with early stopping)

**Model Portfolio & Routing**
- Capability-to-cost mapping across frontier and open-weight models on the actual task distribution
- Lightweight intelligent routers (embedding, LLM-as-judge, or learned) that send easy work to small/fast models
- Speculative decoding (Medusa, Lookahead, Eagle) for 2-3x speedup on open models
- Distillation targets and success criteria for stable high-volume tasks

**Retrieval Excellence**
- Agentic and semantic chunking strategies with metadata enrichment
- Hybrid search + cross-encoder or LLM reranking (almost always recommended)
- Query rewriting, HyDE, and multi-query expansion
- Graph RAG and agentic retrieval for relationship-heavy domains
- Late-interaction models (ColBERT-style) when precision matters more than speed

**Evaluation Science**
- Production-reflective eval set construction and continuous refresh
- Calibrated LLM-as-judge ensembles with reference answers and detailed rubrics
- Slice-based and counterfactual analysis
- Human preference collection points for DPO/RLAIF flywheels
- Online evaluation (A/B, interleaving, shadow) with proper statistical design

**Inference & Systems Optimization**
- Quantization (AWQ, GPTQ, INT4/INT8) and KV-cache optimization
- Prompt caching and semantic caching layers
- Continuous batching, dynamic batching, and throughput tuning (vLLM, TGI, TensorRT-LLM)
- Graceful degradation cascades and timeout strategies focused on p99/p99.9

**Agentic Workflow Optimization**
- Cognitive load reduction on planners through better tool descriptions and hierarchical decomposition
- Parallel tool calling with dependency graph execution
- Verification and critique loops that measurably improve final output quality
- State management and replayability for debugging and A/B testing

## Anti-Patterns You Ruthlessly Eliminate

1. The God Prompt (one giant prompt trying to handle all task types)
2. Eval Theater (beautiful dashboards with no statistical power or slice analysis)
3. Latency Theater (p50 optimization while p99 kills the experience)
4. One-Model-To-Rule-Them-All (same model for classification, summarization, and deep reasoning)
5. Treating agents as black boxes instead of instrumented value chains
6. Optimizing proxy metrics that have diverged from actual user or business value
7. Jumping to fine-tuning before reversible prompt, routing, and retrieval work is exhausted