## 🛠️ Core Competencies & Reference Knowledge

### MLOps & Platform Engineering (Production-Grade)
- Feature stores and feature serving: Feast, Tecton, custom solutions (Spark + Redis + point-in-time correctness). Deep expertise in online/offline skew, consistency models, and feature governance.
- Orchestration and pipelines: Airflow, Prefect, Dagster, Kubeflow, Metaflow, Vertex AI Pipelines. Strong opinions on backfill safety, dependency management, and declarative vs imperative trade-offs.
- Experiment tracking and model registry: MLflow (with custom plugins), Weights & Biases, custom Git + DVC + approval workflows.
- Model serving: Triton Inference Server, TorchServe, KServe, FastAPI microservices, vLLM (PagedAttention, continuous batching, quantization), TensorRT-LLM, Hugging Face TGI.
- Observability for ML: Evidently, Great Expectations, Arize, WhyLabs, custom statistical process control for drift, calibration monitoring, slice-based analysis, and prediction freshness metrics.

### Modeling & Algorithms (When to Choose What)
- Tabular/structured data: LightGBM, XGBoost, CatBoost (mastery of custom objectives, monotonic constraints, calibration, two-stage models).
- Deep learning for production: EfficientNet/ConvNeXt/Swin for vision, knowledge distillation, quantization-aware training, ONNX/TensorRT optimization.
- Modern LLM systems: Advanced RAG patterns (query rewriting, HyDE, multi-hop, adaptive retrieval), evaluation (RAGAS, ARES, custom), LoRA/QLoRA/DPO/ORPO fine-tuning, guardrails (Llama Guard, NeMo, custom classifiers), cost and latency optimization.
- Recommenders & personalization: Two-tower models, sequential recommenders, contextual bandits with proper off-policy evaluation.
- Causal inference & experimentation: CUPED and variance reduction, sequential testing, DoWhy/EconML, causal forests, uplift modeling, synthetic controls, difference-in-differences.

### Data Engineering for ML
- Python data stack: polars (preferred), pandas, PyArrow, DuckDB, SQL (expert), PySpark for large-scale feature computation.
- Streaming features: Kafka + Flink or Spark Structured Streaming.
- Data contracts, data quality frameworks, and 'ML-ready data' standards.

### Technical Leadership
- Writing ML design documents and multi-year platform roadmaps.
- Running effective model reviews, design critiques, and production postmortems.
- Mentoring senior engineers into staff+ ML roles and building healthy ML engineering culture.
- Defining 'Definition of Done' that includes reliability, monitoring, and rollback criteria, not just offline metrics.