## 🧠 專業框架與知識體系

### 平台架構參考模型

#### LLM / GenAI 平台分層
```
┌─────────────────────────────────────────┐
│  Experience Layer                       │
│  Chat UI, Copilot, API Gateway, SDK     │
├─────────────────────────────────────────┤
│  Orchestration Layer                    │
│  Agent framework, workflow, HITL queue  │
├─────────────────────────────────────────┤
│  Model Layer                            │
│  Routing, fine-tuned models, embeddings │
├─────────────────────────────────────────┤
│  Data & Knowledge Layer                 │
│  Vector DB, RAG pipeline, Feature Store │
├─────────────────────────────────────────┤
│  MLOps / LLMOps Layer                   │
│  Train, eval, deploy, monitor, govern   │
├─────────────────────────────────────────┤
│  Infrastructure Layer                   │
│  K8s, GPU pool, multi-cloud, IAM, FinOps│
└─────────────────────────────────────────┘
```

### 成熟度模型（AI Platform Maturity）
| Level | 特徵 | 典型痛點 |
|-------|------|----------|
| L0 Ad-hoc | Notebook 孤島、手動部署 | 無法重現、無監控 |
| L1 Repeatable | 基礎 CI/CD、model registry | 擴展瓶頸、成本黑箱 |
| L2 Defined | 標準 pipeline、RBAC、SLA | 跨團隊摩擦、治理缺口 |
| L3 Managed | 統一 observability、FinOps | 多雲複雜度、agent 爆炸 |
| L4 Optimizing | 自動化 eval、self-healing、marketplace | 持續創新壓力 |

### 核心方法論

#### 1. Platform Opportunity Assessment (POA)
- **Demand**：內部 use case 數量、重複建設程度
- **Supply**：現有 infra、人才、預算
- **Gap**：build 清單 vs 6 個月內可交付
- **Output**：投資決策矩陣（Impact × Effort × Risk）

#### 2. AI Governance Operating Model
- **Policy**：acceptable use、data classification、model tiering
- **Process**：intake → risk assessment → approval → deploy → monitor → retire
- **People**：AI Council、platform team、domain owners
- **Technology**：policy-as-code、guardrails、audit dashboard

#### 3. Model Lifecycle Playbook
```
Ideation → Data Prep → Train/Fine-tune → Evaluate → Approve → Deploy → Monitor → Retrain/Retire
         ↑______________________________ feedback loop _______________________________↓
```

**關鍵閘門**：
- Data quality gate（schema、bias、licensing）
- Eval gate（offline + online、regression suite）
- Security gate（red team、jailbreak test）
- Production gate（SLO sign-off、runbook、on-call）

#### 4. RAG / Agent 基礎設施標準
- **Chunking & embedding** 版本化與 A/B
- **Retrieval eval**：MRR、nDCG、faithfulness、hallucination rate
- **Agent observability**：trace per step、tool call audit、cost per task
- **Prompt registry**：version control、approval workflow、rollback

#### 5. FinOps for AI
- Token-based cost allocation per team/project
- GPU scheduling：spot vs on-demand、autoscaling policy
- Model routing：small model first, escalate on confidence
- Caching：semantic cache、embedding cache、response cache

### 技術棧熟悉度（概念層級，非 vendor 推銷）
| 類別 | 代表技術 | 選型考量 |
|------|----------|----------|
| Orchestration | K8s, Ray, Airflow, Temporal | 工作負載類型、團隊技能 |
| ML Platform | MLflow, Kubeflow, SageMaker, Vertex | 雲策略、現有投資 |
| LLM Ops | LangSmith, Weights & Biases, Arize | eval 深度、整合成本 |
| Vector | Pinecone, Milvus, pgvector, OpenSearch | 規模、延遲、混合搜尋 |
| Inference | vLLM, TGI, TensorRT-LLM, Triton | 吞吐、延遲、多模型 |
| Gateway | LiteLLM, Kong, Apigee | 多 provider routing、rate limit |
| Governance | Guardrails AI, Lakera, custom policy engine | 延遲 overhead、可自訂性 |

### 評估框架
- **ML Test Score**（Google）：14 項生產就緒檢查
- **RAG Triad**：context relevance、groundedness、answer relevance
- **Agent Eval**：task success rate、steps to completion、tool error rate
- **Platform Health**：adoption funnel、incident frequency、mean time to deploy

### 常 Deliverables 模板庫
- AI Platform Strategy 1-pager（董事會版）
- Platform Roadmap（Now / Next / Later）
- Architecture Decision Record (ADR)
- Model Risk Assessment 表
- Incident Runbook & Post-mortem 模板
- Platform SLA & SLO 定義
- Build vs Buy 決策矩陣