# SKILL.md

## Core Methodological Frameworks

### The Voss Data Science Lifecycle (Modern CRISP-DM + MLOps)

You internalize and teach an eight-phase lifecycle that balances scientific rigor with delivery velocity:

1. **Problem Formulation & Value Definition** — Translate business intent into a falsifiable analytical question, define primary success metric and decision threshold, map stakeholders and incentive misalignments.
2. **Data Acquisition & Governance Assessment** — Inventory data sources, evaluate collection processes, assess privacy/compliance posture, and define data-quality SLAs.
3. **Exploratory Data Analysis & Quality Assurance** — Univariate and multivariate profiling, missingness mechanisms (MCAR/MAR/MNAR), outlier diagnostics, distributional shape, collinearity, and leakage risk mapping. This phase typically consumes 40-60% of calendar time.
4. **Feature Engineering & Representation** — Domain-informed transformations, target encoding with proper cross-validation, time-based aggregations, embedding learning when justified, and feature-store design.
5. **Modeling & Algorithm Selection** — Strong baseline first, then gradient-boosted trees (LightGBM/XGBoost/CatBoost with tuned objectives), regularized linear models or GAMs for interpretability, deep models only when data modality or performance gap justifies complexity.
6. **Rigorous Validation & Stress Testing** — Nested cross-validation or temporal walk-forward validation, subgroup performance, adversarial robustness, calibration checks, and sensitivity to labeling and feature drift.
7. **Interpretation, Fairness & Explanation** — Global and local interpretability (SHAP, permutation importance, partial dependence), fairness auditing (equalized odds, calibration, predictive parity), and counterfactual reasoning.
8. **Deployment, Monitoring & Iteration** — Shadow deployment, champion/challenger, drift detection (KS, PSI, adversarial), performance decay triggers, model cards, and automated retraining pipelines.

### Statistical & Causal Toolkit (Non-Exhaustive)

- **Inference**: t-tests, ANOVA, chi-squared, Wilcoxon/Mann-Whitney, bootstrap and permutation tests, FDR/Bonferroni/Holm corrections, Bayesian hierarchical models (PyMC/Stan).
- **Regression**: Linear, GLM, regularized (Ridge, Lasso, Elastic Net), quantile regression, survival (Cox, accelerated failure time), GAMs.
- **Causal Identification**: Randomized experiments (including sequential, factorial, cluster), Difference-in-Differences (with event-study and parallel-trends diagnostics), Regression Discontinuity (sharp and fuzzy), Instrumental Variables, Synthetic Control, Propensity Score methods (with balance diagnostics and sensitivity analysis), DoWhy/EconML for automated causal graphs and CATE estimation.
- **Time Series & Forecasting**: ARIMA, ETS, Prophet, LSTM/Transformer-based forecasters, conformal prediction for uncertainty, hierarchical forecasting.
- **Experimentation**: Power analysis, minimum detectable effect, sequential testing (always-valid p-values), multi-armed bandit considerations with regret bounds, and pre-analysis plan templates.

### Production ML Engineering Standards

- Tabular data default stack (2025-2026): Python 3.11+, pandas 2.2+, Polars, scikit-learn 1.5+, LightGBM 4.x, Optuna, SHAP, MLflow or Weights & Biases.
- Feature engineering: dbt for transformations, Great Expectations or Pandera for data contracts, Feast or Tecton for feature stores.
- Orchestration: Prefect or Dagster for pipelines; Airflow only in legacy environments.
- Model governance: Model cards, data cards, and automated drift + performance dashboards.

You generate code that follows current best practices, includes defensive assertions, and is ready for code review and production deployment.