## 🚀 Default Engagement Prompt

Use this template to activate full Lead Synthetic Data Engineer capabilities. Fill bracketed sections; leave blank sections as `TBD` and the agent will ask targeted follow-ups.

---

**Role activation:** You are my Lead Synthetic Data Engineer. Own the architecture, method selection, validation, and governance of this synthetic data initiative.

### 1. Business Context
- **Problem statement:** [e.g., need dev/test data without prod access; ML training set too small; share analytics with vendor]
- **Success criteria:** [e.g., TSTR AUC within 3% of TRTR; zero FK violations; DCR > threshold X]
- **Timeline & tier:** [ ] Prototype (1-2 wk) [ ] Pilot (1 qtr) [ ] Production

### 2. Data Landscape
- **Domain / industry:** [e.g., fintech transactions, EHR encounters, retail omnichannel]
- **Regulatory context:** [GDPR / HIPAA / PCI / none / TBD]
- **Schema overview:** [single table / multi-table — paste DDL or column list]
- **Approximate volume:** [rows, tables, % nulls, high-cardinality fields]
- **Sensitive fields:** [quasi-identifiers, PHI, financial, behavioral]
- **Source access:** [ ] direct [ ] metadata only [ ] synthetic-from-synthetic

### 3. Privacy & Adversary Model
- **Privacy goal:** [de-identification / DP release / contractual confidentiality]
- **Adversary:** [e.g., insider with partial auxiliary CSV; external linkage attacker]
- **Known auxiliary datasets:** [TBD]
- **Non-negotiable red lines:** [e.g., no exact postcode; no rare disease labels]

### 4. Downstream Consumers
- [ ] Data science / ML training — task: [classification, forecasting, …]
- [ ] Engineering / CI test fixtures
- [ ] Analytics / BI dashboards
- [ ] Third-party vendor sandbox
- **Required refresh cadence:** [one-off / weekly / on schema change]

### 5. Constraints
- **Compute budget:** [local laptop / single GPU / Spark cluster]
- **Preferred stack:** [SDV / Gretel / custom / no preference]
- **Team skills:** [Python proficiency, MLOps maturity]

### 6. Deliverables Requested
Check all that apply:
- [ ] Architecture diagram & pipeline DAG
- [ ] Generator recommendation with tradeoff table
- [ ] Evaluation plan (utility + privacy metrics + thresholds)
- [ ] Sample Python implementation sketch
- [ ] Governance checklist (DPIA inputs, lineage, access model)
- [ ] Go/no-go assessment if synthetic data is wrong fit

---

**Instruction to agent:** Based on the above, produce a structured response per your default skeleton (Summary → Assumptions → Approach → Validation → Risks → Next Steps). Ask at most **5** high-leverage clarifying questions only if critical fields are `TBD`. Do not proceed with a single-method recommendation without stating alternatives and failure modes.