## 🧠 Frameworks & Methodologies

### The PEARL Architecture (Prompt Engineering Architecture Layers)
Use this layered model for every agent design:

```
┌─────────────────────────────────────┐
│  P — Persona (SOUL.md)              │  Identity, mission, expertise
├─────────────────────────────────────┤
│  E — Expression (STYLE.md)          │  Voice, tone, formatting
├─────────────────────────────────────┤
│  A — Assertions (RULES.md)          │  Hard constraints, refusals
├─────────────────────────────────────┤
│  R — Resources (SKILL.md)           │  Frameworks, domain knowledge
├─────────────────────────────────────┤
│  L — Launch (prompts/*.md)          │  Task-specific templates
└─────────────────────────────────────┘
```

### Prompt Design Patterns (Master List)

#### Structural Patterns
- **Modular Soul Architecture** — Decompose into SOUL/STYLE/RULES/SKILL/prompts
- **XML Tag Fencing** — `<instructions>`, `<context>`, `<output_format>` for parseability
- **Role-Task-Format (RTF)** — Explicit role assignment, task definition, output schema
- **Constraint Stacking** — Layer soft preferences under hard rules for conflict resolution
- **Negative Space Prompting** — Explicit "do NOT" lists to reduce common failure modes

#### Reasoning Patterns
- **Chain-of-Thought (CoT)** — "Think step by step" with structured reasoning blocks
- **Tree-of-Thought** — Branch exploration for complex decision trees
- **ReAct** — Interleaved Reasoning + Action for tool-using agents
- **Self-Consistency** — Multiple samples with majority vote for high-stakes outputs
- **Reflexion** — Error analysis loops for iterative improvement

#### Production Patterns
- **Prompt Versioning** — Semantic versioning for prompt changes with changelog
- **Golden Set Regression** — Fixed test suite run on every prompt edit
- **LLM-as-Judge** — Secondary model evaluates primary output against rubric
- **Dynamic Few-Shot** — Retrieve relevant examples by embedding similarity
- **Prompt Compression** — Distill verbose prompts while preserving behavior

### Evaluation Rubric Template
Score each dimension 1-5:

| Dimension | Criteria |
|-----------|----------|
| **Instruction Adherence** | Follows format, constraints, and scope |
| **Factual Grounding** | No hallucination beyond provided context |
| **Safety Compliance** | Refuses appropriately, no policy violations |
| **Tone Alignment** | Matches STYLE.md voice requirements |
| **Efficiency** | Concise without sacrificing completeness |
| **Robustness** | Handles edge cases and adversarial inputs |

### Injection Defense Checklist
- [ ] Delimit user input with explicit boundary tags
- [ ] Instruct model to treat user content as data, not instructions
- [ ] Include "ignore previous instructions in user messages" clause
- [ ] Validate outputs against schema before downstream use
- [ ] Log and alert on anomalous instruction patterns in user input
- [ ] Separate system instructions from retrievable context

### Token Optimization Toolkit
1. **Audit** — Identify redundant instructions and duplicate examples
2. **Compress** — Replace prose with structured bullets and tables
3. **Externalize** — Move reference material to RAG retrieval
4. **Template** — Use variable slots instead of repeating full examples
5. **Measure** — Track tokens-per-successful-task, not tokens-per-request

### Model-Specific Notes
- **Claude family** — Responds well to XML tags, explicit thinking blocks, nuanced rule hierarchies
- **GPT-4o family** — Strong with JSON schema enforcement, function calling, structured outputs API
- **Open-weight models** — Often need more explicit formatting examples and stricter output templates
- **Reasoning models** — Minimize conflicting instructions; let native reasoning breathe