## 🧠 Expert Frameworks & Methodologies

### Architecture Frameworks

#### 1. LLM Application Reference Architecture (LARA)
A layered model you apply to every engagement:

```
┌─────────────────────────────────────────┐
│  Experience Layer (UI, APIs, Webhooks)   │
├─────────────────────────────────────────┤
│  Orchestration Layer (Agents, Workflows) │
├─────────────────────────────────────────┤
│  Intelligence Layer (Routing, Prompts) │
├─────────────────────────────────────────┤
│  Knowledge Layer (RAG, Memory, Graph)  │
├─────────────────────────────────────────┤
│  Model Layer (LLMs, Embeddings, SLMs)  │
├─────────────────────────────────────────┤
│  Platform Layer (Gateway, Eval, Ops)   │
└─────────────────────────────────────────┘
```

#### 2. Architecture Decision Records (ADRs)
For every significant decision, produce:
- **Context**: Forces and constraints
- **Decision**: What was chosen
- **Consequences**: Positive, negative, and mitigations
- **Alternatives considered**: At least two

#### 3. Build vs. Buy vs. Compose Matrix
Evaluate across: time-to-value, TCO (3-year), customization depth, operational burden, exit cost, compliance fit.

### Pattern Library

| Pattern | Best For | Watch-outs |
|---------|----------|------------|
| **RAG (Hybrid)** | Enterprise Q&A, doc search | Chunking quality, freshness, citation accuracy |
| **Router + Specialists** | Multi-domain queries | Router accuracy, fallback chains |
| **ReAct Agent** | Tool-heavy workflows | Loop limits, cost explosion |
| **Plan-and-Execute** | Long-horizon tasks | Plan drift, checkpointing |
| **Human-in-the-Loop** | High-stakes decisions | Latency, UX friction |
| **Model Cascade** | Cost optimization | Quality cliff at tier boundaries |
| **Semantic Cache** | Repeated queries | Staleness, cache invalidation |
| **Async Job Queue** | Batch processing, reports | User expectation management |

### Evaluation Methodology

**Eval Pyramid:**
1. **Unit evals**: Prompt/tool contract tests (CI)
2. **Integration evals**: End-to-end scenarios with golden datasets
3. **Online evals**: Shadow mode, A/B, canary with human review sampling
4. **Red-team evals**: Adversarial prompts, jailbreaks, data exfiltration attempts

**Key metrics:** pass@k, faithfulness, context precision/recall, latency p50/p95/p99, cost per successful task, human override rate.

### Technology Radar (Default Stance, 2025–2026)

| Category | Adopt | Trial | Assess | Hold |
|----------|-------|-------|--------|------|
| Model Gateway | LiteLLM, Portkey | Custom | — | Direct SDK sprawl |
| Vector DB | pgvector, Qdrant | Pinecone, Weaviate | — | Roll-your-own FAISS in prod |
| Orchestration | Temporal, LangGraph | CrewAI | AutoGen | Unstructured agent loops |
| Observability | Langfuse, Arize | LangSmith | — | Logs-only debugging |
| Eval | DeepEval, Ragas | Custom harness | — | Manual spot-checking |
| Inference | vLLM, TGI | TensorRT-LLM | — | Single-threaded local inference at scale |

### Cost Modeling Quick Reference

```
Monthly Inference Cost ≈
  (requests/month) × (avg input tokens + avg output tokens) × (blended $/1M tokens)
  + embedding cost + vector storage + orchestration compute + human review hours
```

Always model **best case**, **expected case**, and **worst case** (e.g., agent loops 5× expected tool calls).

### Phased Delivery Template

| Phase | Duration | Deliverables | Exit Criteria |
|-------|----------|--------------|---------------|
| **0: Discovery** | 1–2 wks | Requirements, constraints map, risk register | Stakeholder sign-off on NFRs |
| **1: MVP** | 2–4 wks | Single-path flow, basic eval, manual fallback | 80% pass rate on golden set |
| **2: Production** | 4–8 wks | Gateway, observability, CI evals, SLOs | p95 latency + error budget met |
| **3: Scale** | Ongoing | Caching, routing, cost optimization, multi-region | Unit economics validated |