Spaces:

Viani
/

DataDetective

Sleeping

App Files Files Community

Viani commited on 13 days ago

Commit

bcd8636

verified ·

1 Parent(s): 5e4b568

Deploy DataDetective: 9-task business investigation environment

Browse files

Files changed (17) hide show

.env.example +11 -0
.gitignore +12 -0
Dockerfile +12 -0
README.md +131 -3
__init__.py +9 -0
client.py +51 -0
inference.py +280 -0
models.py +34 -0
openenv.yaml +40 -0
pyproject.toml +27 -0
requirements.txt +6 -0
server/__init__.py +0 -0
server/app.py +36 -0
server/database.py +657 -0
server/environment.py +192 -0
server/requirements.txt +6 -0
server/tasks.py +408 -0

.env.example ADDED Viewed

	@@ -0,0 +1,11 @@

+# LLM Configuration (required by hackathon evaluator)
+API_BASE_URL=https://router.huggingface.co/v1
+MODEL_NAME=gpt-4.1-mini
+HF_TOKEN=hf_your_token_here
+# Environment server
+ENV_URL=http://localhost:7860
+# AMD LLM Gateway (local development only — overrides API_BASE_URL when set)
+# AMD_LLM_API_KEY=your-ocp-apim-subscription-key-here
+# AMD_GATEWAY_BASE=https://llm-api.amd.com/openai

.gitignore ADDED Viewed

	@@ -0,0 +1,12 @@

+__pycache__/
+*.pyc
+*.pyo
+*.egg-info/
+dist/
+build/
+.env
+.venv/
+venv/
+*.sqlite
+*.db
+.DS_Store

Dockerfile ADDED Viewed

	@@ -0,0 +1,12 @@

+FROM python:3.11-slim
+WORKDIR /app
+COPY server/requirements.txt ./requirements.txt
+RUN pip install --no-cache-dir -r requirements.txt
+COPY . .
+EXPOSE 7860
+CMD ["uvicorn", "server.app:app", "--host", "0.0.0.0", "--port", "7860", "--workers", "2", "--ws-ping-interval", "300", "--ws-ping-timeout", "300"]

README.md CHANGED Viewed

@@ -1,10 +1,138 @@
 ---
 title: DataDetective
-emoji: 🐠
 colorFrom: blue
 colorTo: green
 sdk: docker
-pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
 title: DataDetective
+emoji: 🔍
 colorFrom: blue
 colorTo: green
 sdk: docker
+app_port: 7860
 ---
+# DataDetective — Business Incident Investigation Environment
+An [OpenEnv](https://github.com/meta-pytorch/OpenEnv) environment where AI
+agents investigate real-world business incidents by querying a SQL database,
+analysing patterns, and submitting root-cause findings.
+## What It Does
+The agent is given a realistic company database (TechMart — a mid-size B2B+B2C
+electronics retailer) and a business problem to investigate. It can execute
+SQL queries to explore the data, then submit a final written analysis. The
+environment automatically grades the analysis based on whether key findings
+were identified. Each task has 5 grading criteria worth 0.20 each, enabling
+meaningful partial credit.
+## Tasks (Easy → Hard)
+| # | Task ID | Difficulty | Scenario |
+|---|---------|-----------|----------|
+| 1 | `orders_drop` | Easy | Order volume dropped sharply after promo ended |
+| 2 | `returns_spike` | Medium | Product returns spiking in West region (defective SKU) |
+| 3 | `supplier_quality` | Medium | Supplier-level quality crisis across multiple products |
+| 4 | `shipping_delay` | Medium-Hard | Customer satisfaction crisis from carrier delays |
+| 5 | `inventory_stockout` | Medium-Hard | Regional sales underperformance from warehouse stockout |
+| 6 | `customer_churn` | Hard | Active customer decline across segments post price hike |
+| 7 | `revenue_paradox` | Hard | Revenue up but profit down — multi-causal margin erosion |
+| 8 | `fraud_detection` | Hard | Coordinated fraud ring with fake accounts |
+| 9 | `repeat_purchase_decline` | Hard | Repeat purchase collapse masked by acquisition spend |
+Each task is scored 0.0 – 1.0 based on specific findings the agent must discover.
+## Action / Observation Spaces
+### Action (`DataDetectiveAction`)
+| Field | Type | Description |
+|-------|------|-------------|
+| `action_type` | `str` | `"query"` to run SQL, `"answer"` to submit findings |
+| `content` | `str` | SQL query string or final analysis text |
+### Observation (`DataDetectiveObservation`)
+| Field | Type | Description |
+|-------|------|-------------|
+| `output` | `str` | Query results (formatted table) or feedback |
+| `task_description` | `str` | The investigation task |
+| `schema_info` | `str` | Database schema (shown at reset) |
+| `step_number` | `int` | Current step |
+| `max_steps` | `int` | Maximum steps allowed (30) |
+| `message` | `str` | Status message |
+## Database Schema (11 Tables)
+The TechMart database includes:
+| Table | Description |
+|-------|-------------|
+| `customers` | Customer demographics (region, segment, signup date) |
+| `products` | Product catalog (category, price, cost, supplier) |
+| `orders` | Order history with totals |
+| `order_items` | Line items with quantity and unit price |
+| `returns` | Product returns with reasons and refund amounts |
+| `promotions` | Promotional campaigns with discount percentages |
+| `price_changes` | Historical price adjustments |
+| `shipping` | Shipment records with carrier and delivery dates |
+| `support_tickets` | Customer support tickets by category and priority |
+| `inventory_log` | Daily stock levels per product per warehouse region |
+| `marketing_spend` | Daily marketing spend by channel, campaign, and region |
+All data is synthetic, generated in-memory (no external databases required).
+## Quick Start
+### 1. Install Dependencies
+```bash
+pip install -r requirements.txt
+```
+### 2. Start the Server
+```bash
+uvicorn server.app:app --host 0.0.0.0 --port 7860
+```
+### 3. Health Check
+```bash
+curl http://localhost:7860/health
+```
+### 4. Run the Baseline Agent
+```bash
+API_BASE_URL="https://router.huggingface.co/v1" \
+MODEL_NAME="gpt-4.1-mini" \
+HF_TOKEN="hf_..." \
+python inference.py
+```
+### 5. Docker
+```bash
+docker build -t data-detective .
+docker run -p 7860:7860 data-detective
+```
+## Environment Variables
+| Env Var | Purpose | Required |
+|---------|---------|----------|
+| `API_BASE_URL` | LLM endpoint URL | Yes |
+| `MODEL_NAME` | Model identifier | Yes |
+| `HF_TOKEN` | API key / HF token | Yes |
+| `ENV_URL` | Environment server URL | No (default: `http://localhost:7860`) |
+## How Grading Works
+Each task has an automated grader that checks the agent's final answer for
+specific key findings (keywords, patterns, named entities). Each task has 5
+grading criteria worth 0.20 each, for a maximum score of 1.0. Partial credit
+is awarded for each finding discovered.
+## Setup Requirements
+- Python 3.10+
+- No GPU required
+- Runs within 2 vCPU / 8 GB memory
+- All data is generated in-memory (no external databases)

__init__.py ADDED Viewed

	@@ -0,0 +1,9 @@

+from .models import DataDetectiveAction, DataDetectiveObservation, DataDetectiveState
+from .client import DataDetectiveEnv
+__all__ = [
+    "DataDetectiveAction",
+    "DataDetectiveObservation",
+    "DataDetectiveState",
+    "DataDetectiveEnv",
+]

client.py ADDED Viewed

	@@ -0,0 +1,51 @@

+"""WebSocket client for the DataDetective environment."""
+from typing import Dict
+from openenv.core.env_client import EnvClient
+from openenv.core.client_types import StepResult
+from .models import DataDetectiveAction, DataDetectiveObservation, DataDetectiveState
+class DataDetectiveEnv(
+    EnvClient[DataDetectiveAction, DataDetectiveObservation, DataDetectiveState]
+):
+    """
+    Async/sync client for DataDetective.
+    Example (sync):
+        >>> with DataDetectiveEnv(base_url="http://localhost:7860").sync() as env:
+        ...     result = env.reset(task_id="orders_drop")
+        ...     result = env.step(DataDetectiveAction(action_type="query", content="SELECT COUNT(*) FROM orders"))
+    """
+    def _step_payload(self, action: DataDetectiveAction) -> Dict:
+        return {"action_type": action.action_type, "content": action.content}
+    def _parse_result(self, payload: Dict) -> StepResult[DataDetectiveObservation]:
+        obs = payload.get("observation", {})
+        observation = DataDetectiveObservation(
+            output=obs.get("output", ""),
+            task_description=obs.get("task_description", ""),
+            schema_info=obs.get("schema_info", ""),
+            step_number=obs.get("step_number", 0),
+            max_steps=obs.get("max_steps", 30),
+            message=obs.get("message", ""),
+            done=payload.get("done", False),
+            reward=payload.get("reward"),
+        )
+        return StepResult(
+            observation=observation,
+            reward=payload.get("reward"),
+            done=payload.get("done", False),
+        )
+    def _parse_state(self, payload: Dict) -> DataDetectiveState:
+        return DataDetectiveState(
+            episode_id=payload.get("episode_id"),
+            step_count=payload.get("step_count", 0),
+            task_id=payload.get("task_id", ""),
+            queries_executed=payload.get("queries_executed", 0),
+            max_steps=payload.get("max_steps", 30),
+        )

inference.py ADDED Viewed

	@@ -0,0 +1,280 @@

+#!/usr/bin/env python3
+"""
+Baseline inference script for DataDetective.
+Uses an LLM via the OpenAI-compatible API to investigate each task by
+running SQL queries and submitting a final analysis.
+Required environment variables (set by hackathon evaluator):
+  API_BASE_URL  — LLM endpoint (e.g. https://router.huggingface.co/v1)
+  MODEL_NAME    — model identifier (e.g. gpt-4.1-mini)
+  HF_TOKEN      — API key / Hugging Face token
+Optional:
+  ENV_URL       — DataDetective server URL (default http://localhost:7860)
+  AMD_LLM_API_KEY — If set, uses AMD Gateway instead (local dev only)
+"""
+import asyncio
+import json
+import os
+import re
+import sys
+import time
+from openai import AzureOpenAI, OpenAI
+import websockets.asyncio.client as _wsc
+_orig_ws_connect = _wsc.connect
+def _patched_connect(*a, **kw):
+    kw.setdefault("ping_interval", 300)
+    kw.setdefault("ping_timeout", 300)
+    return _orig_ws_connect(*a, **kw)
+_wsc.connect = _patched_connect
+import openenv.core.env_client as _ec
+_ec.ws_connect = _patched_connect
+from openenv.core.generic_client import GenericEnvClient
+# ---------------------------------------------------------------------------
+# Configuration
+# ---------------------------------------------------------------------------
+API_BASE_URL = os.environ.get("API_BASE_URL", "https://router.huggingface.co/v1")
+MODEL_NAME = os.environ.get("MODEL_NAME", "gpt-4.1-mini")
+HF_TOKEN = os.environ.get("HF_TOKEN") or os.environ.get("API_KEY", "")
+AMD_LLM_API_KEY = os.environ.get("AMD_LLM_API_KEY", "")
+ENV_URL = os.environ.get("ENV_URL", "http://localhost:7860").rstrip("/")
+BENCHMARK = "data_detective"
+MAX_STEPS = 20
+TASK_IDS = [
+    "orders_drop",
+    "returns_spike",
+    "customer_churn",
+    "shipping_delay",
+    "revenue_paradox",
+    "supplier_quality",
+    "inventory_stockout",
+    "fraud_detection",
+    "repeat_purchase_decline",
+]
+def _build_llm_client() -> OpenAI:
+    if AMD_LLM_API_KEY:
+        return AzureOpenAI(
+            api_key="dummy",
+            api_version="2024-02-01",
+            base_url=os.environ.get("AMD_GATEWAY_BASE", "https://llm-api.amd.com/openai"),
+            default_headers={"Ocp-Apim-Subscription-Key": AMD_LLM_API_KEY},
+        )
+    if HF_TOKEN:
+        return OpenAI(base_url=API_BASE_URL, api_key=HF_TOKEN)
+    print(
+        "ERROR: Set HF_TOKEN (or API_KEY) for LLM access, "
+        "or AMD_LLM_API_KEY for AMD Gateway. Exiting.",
+        file=sys.stderr,
+    )
+    sys.exit(1)
+llm = _build_llm_client()
+SYSTEM_PROMPT = """\
+You are an expert data analyst investigating a business incident using a
+SQL database.  You have a LIMITED number of query steps, so be strategic.
+At each turn respond with EXACTLY one JSON object (no extra text):
+  {{"action_type": "query", "content": "<SQL query>"}}
+  {{"action_type": "answer", "content": "<your analysis>"}}
+Investigation strategy:
+1. EXPLORE (1-2 queries): List tables and sample key columns to understand
+   the schema.  Note all available tables -- some may hold critical clues.
+2. HYPOTHESISE: Based on the task description, form 2-3 likely root causes.
+3. QUERY (targeted): Run focused queries that confirm or reject each
+   hypothesis.  Use JOINs across tables, GROUP BY with aggregates, and
+   compare time periods.  Avoid broad SELECT * scans.
+4. QUANTIFY: For every finding, gather specific numbers -- counts, totals,
+   percentages, before/after comparisons.
+5. ANSWER: Submit a thorough analysis naming every root cause with
+   supporting evidence.  Include specific product names, regions, customer
+   segments, suppliers, dollar amounts, dates, and percentages.
+You have {max_steps} steps total.  Budget roughly 70 % for querying and
+reserve the last few steps for your answer.  Do NOT run out of steps
+without submitting -- partial evidence is better than none.
+"""
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+def _extract_json(text: str) -> dict:
+    text = text.strip()
+    try:
+        return json.loads(text)
+    except json.JSONDecodeError:
+        pass
+    m = re.search(r"```(?:json)?\s*(\{.*?\})\s*```", text, re.DOTALL)
+    if m:
+        try:
+            return json.loads(m.group(1))
+        except json.JSONDecodeError:
+            pass
+    m = re.search(r"\{[^{}]*\}", text, re.DOTALL)
+    if m:
+        try:
+            return json.loads(m.group(0))
+        except json.JSONDecodeError:
+            pass
+    return {"action_type": "answer", "content": text}
+def _log_start(task_id: str) -> None:
+    print(
+        f"[START] task={task_id} env={BENCHMARK} model={MODEL_NAME}",
+        flush=True,
+    )
+def _log_step(step: int, action: dict, reward: float, done: bool, error: str | None) -> None:
+    action_str = json.dumps(action, separators=(",", ":"))
+    error_val = f"'{error}'" if error else "null"
+    print(
+        f"[STEP] step={step} action={action_str} "
+        f"reward={reward:.2f} done={str(done).lower()} error={error_val}",
+        flush=True,
+    )
+def _log_end(success: bool, steps: int, score: float, rewards: list[float]) -> None:
+    rewards_str = ",".join(f"{r:.2f}" for r in rewards)
+    print(
+        f"[END] success={str(success).lower()} steps={steps} "
+        f"score={score:.3f} rewards={rewards_str}",
+        flush=True,
+    )
+async def run_task(task_id: str) -> float:
+    _log_start(task_id)
+    rewards: list[float] = []
+    step = 0
+    reward = 0.0
+    done = False
+    success = False
+    error_msg = None
+    try:
+        async with GenericEnvClient(base_url=ENV_URL) as env:
+            result = await env.reset(task_id=task_id)
+            obs = result.observation
+            system = SYSTEM_PROMPT.format(max_steps=MAX_STEPS)
+            messages = [
+                {"role": "system", "content": system},
+                {
+                    "role": "user",
+                    "content": (
+                        f"## Investigation Task\n{obs.get('task_description', '')}\n\n"
+                        f"## Database\n{obs.get('schema_info', '')}\n\n"
+                        f"You have {MAX_STEPS} steps. Begin your investigation."
+                    ),
+                },
+            ]
+            while not done and step < MAX_STEPS:
+                try:
+                    completion = llm.chat.completions.create(
+                        model=MODEL_NAME,
+                        messages=messages,
+                        temperature=0.1,
+                        max_completion_tokens=1024,
+                    )
+                    llm_text = completion.choices[0].message.content or ""
+                except Exception as exc:
+                    llm_text = json.dumps({
+                        "action_type": "answer",
+                        "content": "Unable to complete analysis due to LLM error.",
+                    })
+                    error_msg = str(exc)
+                action = _extract_json(llm_text)
+                if "action_type" not in action:
+                    action["action_type"] = "query"
+                if "content" not in action:
+                    action["content"] = llm_text
+                result = await env.step(action)
+                step += 1
+                done = result.done
+                reward = result.reward or 0.0
+                rewards.append(reward)
+                result_obs = result.observation
+                remaining = MAX_STEPS - step
+                _log_step(step, action, reward, done, error_msg)
+                error_msg = None
+                messages.append({"role": "assistant", "content": llm_text})
+                if not done and remaining <= 3:
+                    urgency = (
+                        f"URGENT: Only {remaining} step(s) left! "
+                        "You MUST submit your final answer NOW using "
+                        '{"action_type": "answer", "content": "..."}. '
+                        "Summarize ALL findings so far."
+                    )
+                else:
+                    urgency = "Continue investigating or submit your final answer."
+                messages.append({
+                    "role": "user",
+                    "content": (
+                        f"Query result:\n{result_obs.get('output', '')}\n\n"
+                        f"{result_obs.get('message', '')}\n\n"
+                        f"[Step {step}/{MAX_STEPS}] {urgency}"
+                    ),
+                })
+        success = done and reward > 0.0
+    except Exception as exc:
+        error_msg = str(exc)
+        _log_step(step + 1, {"action_type": "error"}, 0.0, False, error_msg)
+    score = reward if done else 0.0
+    _log_end(success=success, steps=step, score=score, rewards=rewards)
+    return score
+# ---------------------------------------------------------------------------
+# Main
+# ---------------------------------------------------------------------------
+async def amain():
+    total = 0.0
+    for tid in TASK_IDS:
+        try:
+            r = await run_task(tid)
+        except Exception as exc:
+            print(f"[END] success=false steps=0 score=0.000 rewards=", flush=True)
+            r = 0.0
+        total += r
+    avg = total / len(TASK_IDS) if TASK_IDS else 0
+    print(f"\n=== Overall average score: {avg:.2f} ===", flush=True)
+def main():
+    asyncio.run(amain())
+if __name__ == "__main__":
+    main()

models.py ADDED Viewed

	@@ -0,0 +1,34 @@

+from pydantic import Field
+from openenv.core.env_server.types import Action, Observation, State
+class DataDetectiveAction(Action):
+    """Agent action: run a SQL query or submit a final answer."""
+    action_type: str = Field(
+        ...,
+        description="'query' to execute SQL against the database, or 'answer' to submit findings",
+    )
+    content: str = Field(
+        ...,
+        description="SQL query string (for action_type='query') or final analysis text (for action_type='answer')",
+    )
+class DataDetectiveObservation(Observation):
+    """Observation returned after each action."""
+    output: str = Field(default="", description="Query results or system feedback")
+    task_description: str = Field(default="", description="The investigation task to solve")
+    schema_info: str = Field(default="", description="Database schema (provided at reset)")
+    step_number: int = Field(default=0, description="Current step in the episode")
+    max_steps: int = Field(default=30, description="Maximum steps allowed")
+    message: str = Field(default="", description="Status or feedback message")
+class DataDetectiveState(State):
+    """Internal environment state."""
+    task_id: str = Field(default="", description="Current task identifier")
+    queries_executed: int = Field(default=0, description="Number of SQL queries run so far")
+    max_steps: int = Field(default=30, description="Maximum steps allowed")

openenv.yaml ADDED Viewed

	@@ -0,0 +1,40 @@

+name: data_detective
+version: "1.0.0"
+description: >
+  DataDetective: A business incident investigation environment where AI agents
+  use SQL queries to analyze a realistic e-commerce company database (TechMart)
+  and uncover root causes of business problems. Covers 9 tasks spanning order
+  analysis, product returns, customer churn, shipping ops, margin analysis,
+  supplier quality, inventory stockouts, fraud detection, and retention.
+endpoints:
+  reset: /reset
+  step: /step
+  state: /state
+tasks:
+  - id: orders_drop
+    difficulty: easy
+    description: Order volume dropped sharply after a major promotion ended
+  - id: returns_spike
+    difficulty: medium
+    description: Product returns spiking in a specific region due to defective SKU
+  - id: customer_churn
+    difficulty: hard
+    description: Active customer count declining across specific segments
+  - id: shipping_delay
+    difficulty: medium-hard
+    description: Customer satisfaction crisis driven by carrier delays in one region
+  - id: revenue_paradox
+    difficulty: hard
+    description: Revenue is up but profit is down — multi-causal margin erosion
+  - id: supplier_quality
+    difficulty: medium
+    description: Systemic quality issues from a single supplier across multiple products
+  - id: inventory_stockout
+    difficulty: medium-hard
+    description: Regional sales underperformance caused by warehouse stockout during promo
+  - id: fraud_detection
+    difficulty: hard
+    description: Coordinated fraud ring of fake accounts placing high-value orders
+  - id: repeat_purchase_decline
+    difficulty: hard
+    description: Repeat purchase rates collapsing while acquisition masks the problem

pyproject.toml ADDED Viewed

	@@ -0,0 +1,27 @@

+[project]
+name = "data_detective_env"
+version = "1.0.0"
+description = "DataDetective: Business incident investigation environment for OpenEnv"
+readme = "README.md"
+requires-python = ">=3.10"
+dependencies = [
+    "openenv-core>=0.2.0",
+    "fastapi>=0.104.0",
+    "uvicorn>=0.24.0",
+    "pydantic>=2.0.0",
+    "websockets>=12.0",
+]
+[project.optional-dependencies]
+dev = ["pytest", "httpx"]
+inference = ["openai>=1.0.0"]
+[build-system]
+requires = ["setuptools>=68.0"]
+build-backend = "setuptools.backends._legacy:_Backend"
+[tool.setuptools.packages.find]
+include = ["data_detective_env*"]
+[project.scripts]
+server = "server.app:main"

requirements.txt ADDED Viewed

	@@ -0,0 +1,6 @@

+openenv-core>=0.2.0
+fastapi>=0.104.0
+uvicorn>=0.24.0
+pydantic>=2.0.0
+websockets>=12.0
+openai>=1.0.0

server/__init__.py ADDED Viewed

File without changes

server/app.py ADDED Viewed

	@@ -0,0 +1,36 @@

+"""FastAPI application for the DataDetective environment."""
+try:
+    from openenv.core.env_server.http_server import create_app
+except Exception as e:
+    raise ImportError(
+        "openenv-core is required.  pip install openenv-core"
+    ) from e
+try:
+    from ..models import DataDetectiveAction, DataDetectiveObservation
+    from .environment import DataDetectiveEnvironment
+except (ImportError, ModuleNotFoundError):
+    from models import DataDetectiveAction, DataDetectiveObservation
+    from server.environment import DataDetectiveEnvironment
+app = create_app(
+    DataDetectiveEnvironment,
+    DataDetectiveAction,
+    DataDetectiveObservation,
+    env_name="data_detective",
+    max_concurrent_envs=10,
+)
+def main(host: str = "0.0.0.0", port: int = 7860):
+    import uvicorn
+    uvicorn.run(app, host=host, port=port)
+if __name__ == "__main__":
+    import argparse
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--port", type=int, default=7860)
+    args = parser.parse_args()
+    main(port=args.port)

server/database.py ADDED Viewed

	@@ -0,0 +1,657 @@

+"""
+Generates a realistic in-memory SQLite database for TechMart, a fictional
+e-commerce company.  The data contains deliberate patterns that support
+nine investigation tasks:
+  1. Orders drop after a major promotion ends
+  2. Product returns spike for a specific SKU in the West region
+  3. Customer churn concentrated in the Enterprise/Northeast segment
+  4. Shipping delays by QuickShip in the Midwest driving support tickets
+  5. Revenue up but profit down (multi-causal paradox)
+  6. Supplier quality crisis (AudioTech products 6 & 7)
+  7. Inventory stockout in West for Monitor 27-inch during promo
+  8. Coordinated fraud ring in Southeast with new accounts
+  9. Repeat purchase decline masked by new-customer acquisition spend
+"""
+import random
+import sqlite3
+from datetime import datetime, timedelta
+PRODUCTS = [
+    (1,  "Laptop Pro 15",            "Electronics",  999.99,  650.00, "TechCorp"),
+    (2,  "Desktop Workstation",      "Electronics", 1499.99,  950.00, "TechCorp"),
+    (3,  "Tablet Ultra",             "Electronics",  599.99,  350.00, "TechCorp"),
+    (4,  "Monitor 27-inch",          "Electronics",  449.99,  280.00, "DisplayMax"),
+    (5,  "Smart TV 55-inch",         "Electronics",  699.99,  420.00, "DisplayMax"),
+    (6,  "Wireless Headphones Pro",  "Accessories",  149.99,   45.00, "AudioTech"),
+    (7,  "Bluetooth Speaker",        "Accessories",   79.99,   30.00, "AudioTech"),
+    (8,  "USB-C Hub",                "Accessories",   49.99,   15.00, "ConnectPlus"),
+    (9,  "Laptop Bag Premium",       "Accessories",   39.99,   12.00, "CarryAll"),
+    (10, "Mouse Pad XL",             "Accessories",   24.99,    8.00, "CarryAll"),
+    (11, "Office Suite License",     "Software",     199.99,   20.00, "SoftVault"),
+    (12, "Antivirus Pro Annual",     "Software",      49.99,    5.00, "SecureNet"),
+    (13, "Cloud Backup 1TB",         "Software",      99.99,   10.00, "CloudStore"),
+    (14, "Design Studio Pro",        "Software",     299.99,   30.00, "CreativeSoft"),
+    (15, "DevTools Ultimate",        "Software",     149.99,   15.00, "CodeForge"),
+    (16, "Mechanical Keyboard RGB",  "Peripherals",  129.99,   60.00, "KeyMaster"),
+    (17, "Wireless Mouse Pro",       "Peripherals",   59.99,   20.00, "ClickTech"),
+    (18, "Webcam HD 1080p",          "Peripherals",   89.99,   35.00, "VisionCam"),
+    (19, "External SSD 1TB",         "Peripherals",  109.99,   55.00, "StoragePro"),
+    (20, "Laser Printer Pro",        "Peripherals",  249.99,  130.00, "PrintMax"),
+]
+_FIRST = [
+    "James","Mary","Robert","Patricia","John","Jennifer","Michael","Linda",
+    "David","Elizabeth","William","Barbara","Richard","Susan","Joseph","Jessica",
+    "Thomas","Sarah","Christopher","Karen","Charles","Lisa","Daniel","Nancy",
+    "Matthew","Betty","Anthony","Margaret","Mark","Sandra","Donald","Ashley",
+    "Steven","Dorothy","Andrew","Kimberly","Paul","Emily","Joshua","Donna",
+    "Kenneth","Michelle","Kevin","Carol","Brian","Amanda","George","Melissa",
+    "Timothy","Deborah",
+]
+_LAST = [
+    "Smith","Johnson","Williams","Brown","Jones","Garcia","Miller","Davis",
+    "Rodriguez","Martinez","Hernandez","Lopez","Gonzalez","Wilson","Anderson",
+    "Thomas","Taylor","Moore","Jackson","Martin","Lee","Perez","Thompson",
+    "White","Harris","Sanchez","Clark","Ramirez","Lewis","Robinson","Walker",
+    "Young","Allen","King","Wright","Scott","Torres","Nguyen","Hill","Flores",
+    "Green","Adams","Nelson","Baker","Hall","Rivera","Campbell","Mitchell",
+    "Carter","Roberts",
+]
+REGIONS = ["Northeast", "Southeast", "West", "Midwest"]
+PRICE_CHANGES = [
+    (1,   999.99, 1149.99, "2024-02-01", "Annual pricing adjustment"),
+    (2,  1499.99, 1699.99, "2024-02-01", "Annual pricing adjustment"),
+    (11,  199.99,  229.99, "2024-02-01", "Annual pricing adjustment"),
+    (15,  149.99,  174.99, "2024-02-01", "Annual pricing adjustment"),
+    (19,  109.99,  129.99, "2024-02-01", "Annual pricing adjustment"),
+]
+PROMOTIONS = [
+    (1, "New Year Kickoff",      "2024-01-01", "2024-01-15", 10.0, "All"),
+    (2, "Valentine Tech Sale",   "2024-02-10", "2024-02-14", 15.0, "Electronics"),
+    (3, "Spring Mega Sale",      "2024-02-15", "2024-03-01", 25.0, "All"),
+]
+CARRIERS = ["QuickShip", "FastFreight", "ReliableLogistics"]
+TICKET_CATEGORIES = ["delivery_delay", "product_defect", "billing_issue", "general_inquiry"]
+MARKETING_CHANNELS = ["email", "social_media", "search_ads", "display_ads", "affiliate"]
+def _date_range(start: datetime, end: datetime):
+    d = start
+    while d <= end:
+        yield d
+        d += timedelta(days=1)
+def _effective_price(base_prices: dict, changes_by_pid: dict, pid: int, date_str: str):
+    """Return the unit price for *pid* on *date_str*, considering price changes."""
+    price = base_prices[pid]
+    for new_price, change_date in changes_by_pid.get(pid, []):
+        if date_str >= change_date:
+            price = new_price
+    return price
+def create_database(seed: int = 42) -> sqlite3.Connection:
+    rng = random.Random(seed)
+    conn = sqlite3.connect(":memory:", check_same_thread=False)
+    c = conn.cursor()
+    c.executescript("""
+        CREATE TABLE customers (
+            customer_id  INTEGER PRIMARY KEY,
+            name         TEXT NOT NULL,
+            email        TEXT NOT NULL,
+            region       TEXT NOT NULL,
+            segment      TEXT NOT NULL,
+            signup_date  TEXT NOT NULL
+        );
+        CREATE TABLE products (
+            product_id INTEGER PRIMARY KEY,
+            name       TEXT NOT NULL,
+            category   TEXT NOT NULL,
+            price      REAL NOT NULL,
+            cost       REAL NOT NULL,
+            supplier   TEXT NOT NULL
+        );
+        CREATE TABLE orders (
+            order_id     INTEGER PRIMARY KEY AUTOINCREMENT,
+            customer_id  INTEGER NOT NULL,
+            order_date   TEXT NOT NULL,
+            status       TEXT NOT NULL DEFAULT 'completed',
+            total_amount REAL NOT NULL DEFAULT 0,
+            FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
+        );
+        CREATE TABLE order_items (
+            item_id    INTEGER PRIMARY KEY AUTOINCREMENT,
+            order_id   INTEGER NOT NULL,
+            product_id INTEGER NOT NULL,
+            quantity   INTEGER NOT NULL DEFAULT 1,
+            unit_price REAL NOT NULL,
+            FOREIGN KEY (order_id)   REFERENCES orders(order_id),
+            FOREIGN KEY (product_id) REFERENCES products(product_id)
+        );
+        CREATE TABLE returns (
+            return_id     INTEGER PRIMARY KEY AUTOINCREMENT,
+            order_id      INTEGER NOT NULL,
+            product_id    INTEGER NOT NULL,
+            return_date   TEXT NOT NULL,
+            reason        TEXT NOT NULL,
+            refund_amount REAL NOT NULL,
+            FOREIGN KEY (order_id)   REFERENCES orders(order_id),
+            FOREIGN KEY (product_id) REFERENCES products(product_id)
+        );
+        CREATE TABLE promotions (
+            promo_id            INTEGER PRIMARY KEY,
+            name                TEXT NOT NULL,
+            start_date          TEXT NOT NULL,
+            end_date            TEXT NOT NULL,
+            discount_pct        REAL NOT NULL,
+            applicable_category TEXT
+        );
+        CREATE TABLE price_changes (
+            change_id   INTEGER PRIMARY KEY AUTOINCREMENT,
+            product_id  INTEGER NOT NULL,
+            old_price   REAL NOT NULL,
+            new_price   REAL NOT NULL,
+            change_date TEXT NOT NULL,
+            reason      TEXT,
+            FOREIGN KEY (product_id) REFERENCES products(product_id)
+        );
+        CREATE TABLE shipping (
+            shipment_id   INTEGER PRIMARY KEY AUTOINCREMENT,
+            order_id      INTEGER NOT NULL,
+            carrier       TEXT NOT NULL,
+            ship_date     TEXT NOT NULL,
+            delivery_date TEXT NOT NULL,
+            status        TEXT NOT NULL DEFAULT 'delivered',
+            FOREIGN KEY (order_id) REFERENCES orders(order_id)
+        );
+        CREATE TABLE support_tickets (
+            ticket_id         INTEGER PRIMARY KEY AUTOINCREMENT,
+            customer_id       INTEGER NOT NULL,
+            product_id        INTEGER,
+            created_date      TEXT NOT NULL,
+            category          TEXT NOT NULL,
+            priority          TEXT NOT NULL DEFAULT 'medium',
+            resolution_status TEXT NOT NULL DEFAULT 'open',
+            FOREIGN KEY (customer_id) REFERENCES customers(customer_id),
+            FOREIGN KEY (product_id)  REFERENCES products(product_id)
+        );
+        CREATE TABLE inventory_log (
+            log_id           INTEGER PRIMARY KEY AUTOINCREMENT,
+            product_id       INTEGER NOT NULL,
+            log_date         TEXT NOT NULL,
+            units_in_stock   INTEGER NOT NULL,
+            units_ordered    INTEGER NOT NULL DEFAULT 0,
+            warehouse_region TEXT NOT NULL,
+            FOREIGN KEY (product_id) REFERENCES products(product_id)
+        );
+        CREATE TABLE marketing_spend (
+            spend_id      INTEGER PRIMARY KEY AUTOINCREMENT,
+            channel       TEXT NOT NULL,
+            campaign_name TEXT NOT NULL,
+            region        TEXT NOT NULL,
+            spend_date    TEXT NOT NULL,
+            amount        REAL NOT NULL
+        );
+    """)
+    c.executemany("INSERT INTO products VALUES (?,?,?,?,?,?)", PRODUCTS)
+    base_prices = {p[0]: p[3] for p in PRODUCTS}
+    segments_pool = ["Enterprise"] * 35 + ["SMB"] * 55 + ["Consumer"] * 60
+    rng.shuffle(segments_pool)
+    customers = []
+    for i in range(150):
+        first = rng.choice(_FIRST)
+        last  = rng.choice(_LAST)
+        name  = f"{first} {last}"
+        email = f"{first.lower()}.{last.lower()}{i}@techmart.com"
+        region  = REGIONS[i % 4]
+        segment = segments_pool[i]
+        signup  = (datetime(2023, 1, 1) + timedelta(days=rng.randint(0, 364))).strftime("%Y-%m-%d")
+        c.execute("INSERT INTO customers VALUES (?,?,?,?,?,?)",
+                  (i + 1, name, email, region, segment, signup))
+        customers.append((i + 1, name, email, region, segment, signup))
+    ent_ne    = [cu for cu in customers if cu[4] == "Enterprise" and cu[3] == "Northeast"]
+    ent_other = [cu for cu in customers if cu[4] == "Enterprise" and cu[3] != "Northeast"]
+    smb_all   = [cu for cu in customers if cu[4] == "SMB"]
+    con_all   = [cu for cu in customers if cu[4] == "Consumer"]
+    c.executemany("INSERT INTO promotions VALUES (?,?,?,?,?,?)", PROMOTIONS)
+    for pid, old_p, new_p, dt, reason in PRICE_CHANGES:
+        c.execute(
+            "INSERT INTO price_changes (product_id,old_price,new_price,change_date,reason) VALUES (?,?,?,?,?)",
+            (pid, old_p, new_p, dt, reason),
+        )
+    changes_by_pid: dict[int, list] = {}
+    for pid, _, new_p, dt, _ in PRICE_CHANGES:
+        changes_by_pid.setdefault(pid, []).append((new_p, dt))
+    START   = datetime(2024, 1, 1)
+    END     = datetime(2024, 3, 15)
+    PROMO_S = datetime(2024, 2, 15)
+    PROMO_E = datetime(2024, 3, 1)
+    PRICE_INC = datetime(2024, 2, 1)
+    product_weights_base = [1.0] * 20
+    product_weights_base[5] = 3.0
+    for day in _date_range(START, END):
+        date_str = day.strftime("%Y-%m-%d")
+        is_promo = PROMO_S <= day <= PROMO_E
+        after_price_inc = day >= PRICE_INC
+        daily_count = rng.randint(25, 35) if is_promo else rng.randint(12, 18)
+        for _ in range(daily_count):
+            roll = rng.random()
+            if roll < 0.08:
+                pool = ent_ne
+                if after_price_inc and rng.random() < 0.85:
+                    continue
+            elif roll < 0.22:
+                pool = ent_other
+                if after_price_inc and rng.random() < 0.50:
+                    continue
+            elif roll < 0.55:
+                pool = smb_all
+                if after_price_inc and rng.random() < 0.20:
+                    continue
+            else:
+                pool = con_all
+            cust = rng.choice(pool)
+            cust_id, _, _, cust_region, _, _ = cust
+            weights = list(product_weights_base)
+            if cust_region == "West":
+                weights[5] = 7.0
+                if is_promo:
+                    weights[3] = 0.1
+            num_items = rng.choices([1, 2, 3], weights=[0.6, 0.3, 0.1])[0]
+            pids = list(set(rng.choices(range(1, 21), weights=weights, k=num_items)))
+            c.execute(
+                "INSERT INTO orders (customer_id, order_date, status, total_amount) VALUES (?,?,?,?)",
+                (cust_id, date_str, "completed", 0),
+            )
+            order_id = c.lastrowid
+            total = 0.0
+            for pid in pids:
+                qty = rng.choices([1, 2, 3], weights=[0.75, 0.20, 0.05])[0]
+                price = _effective_price(base_prices, changes_by_pid, pid, date_str)
+                if is_promo:
+                    price = round(price * 0.75, 2)
+                total += price * qty
+                c.execute(
+                    "INSERT INTO order_items (order_id, product_id, quantity, unit_price) VALUES (?,?,?,?)",
+                    (order_id, pid, qty, round(price, 2)),
+                )
+            c.execute("UPDATE orders SET total_amount=? WHERE order_id=?",
+                      (round(total, 2), order_id))
+    c.execute("""
+        SELECT oi.item_id, oi.order_id, oi.product_id, oi.unit_price, oi.quantity,
+               o.order_date, cu.region
+        FROM order_items oi
+        JOIN orders o  ON oi.order_id   = o.order_id
+        JOIN customers cu ON o.customer_id = cu.customer_id
+    """)
+    items = c.fetchall()
+    defect_reasons = ["defective_unit", "stopped_working", "poor_audio_quality", "battery_issue"]
+    normal_reasons = ["changed_mind", "wrong_size", "found_cheaper", "not_as_expected"]
+    speaker_defect_reasons = ["audio_distortion", "bluetooth_disconnect", "battery_issue", "stopped_working"]
+    for _, order_id, product_id, unit_price, qty, order_date, region in items:
+        if product_id == 6 and region == "West":
+            prob = 0.38
+            reasons = defect_reasons
+        elif product_id == 6:
+            prob = 0.08
+            reasons = defect_reasons + normal_reasons
+        elif product_id == 7:
+            prob = 0.12
+            reasons = speaker_defect_reasons
+        else:
+            prob = 0.04
+            reasons = normal_reasons
+        if rng.random() < prob:
+            ret_date = (datetime.strptime(order_date, "%Y-%m-%d")
+                        + timedelta(days=rng.randint(3, 14))).strftime("%Y-%m-%d")
+            c.execute(
+                "INSERT INTO returns (order_id, product_id, return_date, reason, refund_amount) VALUES (?,?,?,?,?)",
+                (order_id, product_id, ret_date, rng.choice(reasons), round(unit_price * qty, 2)),
+            )
+    # -- Shipping records for every order ------------------------------------
+    QUICKSHIP_DELAY_START = datetime(2024, 2, 10)
+    c.execute("SELECT order_id, order_date, customer_id FROM orders")
+    all_orders = c.fetchall()
+    cust_region_map = {cu[0]: cu[3] for cu in customers}
+    for order_id, order_date_str, cust_id in all_orders:
+        order_dt = datetime.strptime(order_date_str, "%Y-%m-%d")
+        region = cust_region_map[cust_id]
+        if region == "Midwest":
+            carrier = rng.choices(CARRIERS, weights=[0.40, 0.35, 0.25])[0]
+        else:
+            carrier = rng.choices(CARRIERS, weights=[0.25, 0.40, 0.35])[0]
+        ship_dt = order_dt + timedelta(days=rng.randint(0, 1))
+        base_transit = rng.randint(2, 4)
+        if carrier == "QuickShip" and region == "Midwest" and order_dt >= QUICKSHIP_DELAY_START:
+            extra_delay = rng.randint(5, 10)
+            status = "delayed"
+        elif carrier == "FastFreight":
+            extra_delay = rng.randint(0, 2)
+            status = "delivered"
+        else:
+            extra_delay = 0
+            status = "delivered"
+        delivery_dt = ship_dt + timedelta(days=base_transit + extra_delay)
+        if status == "delayed" and rng.random() < 0.7:
+            status = "delivered"
+        c.execute(
+            "INSERT INTO shipping (order_id, carrier, ship_date, delivery_date, status) "
+            "VALUES (?,?,?,?,?)",
+            (order_id, carrier, ship_dt.strftime("%Y-%m-%d"),
+             delivery_dt.strftime("%Y-%m-%d"), status),
+        )
+    # -- Support tickets -----------------------------------------------------
+    ticket_priorities = ["low", "medium", "high", "critical"]
+    ticket_resolutions = ["open", "resolved", "escalated"]
+    for day in _date_range(START, END):
+        date_str = day.strftime("%Y-%m-%d")
+        after_qs_issues = day >= QUICKSHIP_DELAY_START
+        for region_name in REGIONS:
+            region_custs = [cu for cu in customers if cu[3] == region_name]
+            # Delivery delay tickets: spike in Midwest after QuickShip issues
+            if region_name == "Midwest" and after_qs_issues:
+                n_delay = rng.randint(3, 6)
+            else:
+                n_delay = rng.randint(0, 1)
+            for _ in range(n_delay):
+                cu = rng.choice(region_custs)
+                pri = rng.choices(ticket_priorities, weights=[0.1, 0.3, 0.4, 0.2])[0]
+                res = rng.choices(ticket_resolutions, weights=[0.3, 0.5, 0.2])[0]
+                c.execute(
+                    "INSERT INTO support_tickets "
+                    "(customer_id, product_id, created_date, category, priority, resolution_status) "
+                    "VALUES (?,?,?,?,?,?)",
+                    (cu[0], None, date_str, "delivery_delay", pri, res),
+                )
+            # Product defect tickets: elevated for AudioTech products (6 in West, 7 everywhere)
+            if region_name == "West":
+                n_defect = rng.randint(1, 3)
+            else:
+                n_defect = 1 if rng.random() < 0.3 else 0
+            for _ in range(n_defect):
+                cu = rng.choice(region_custs)
+                pid = 6 if region_name == "West" or rng.random() < 0.4 else rng.randint(1, 20)
+                pri = rng.choices(ticket_priorities, weights=[0.1, 0.3, 0.4, 0.2])[0]
+                res = rng.choices(ticket_resolutions, weights=[0.4, 0.4, 0.2])[0]
+                c.execute(
+                    "INSERT INTO support_tickets "
+                    "(customer_id, product_id, created_date, category, priority, resolution_status) "
+                    "VALUES (?,?,?,?,?,?)",
+                    (cu[0], pid, date_str, "product_defect", pri, res),
+                )
+            # Product 7 (Bluetooth Speaker) defect tickets across all regions
+            if rng.random() < 0.45:
+                cu = rng.choice(region_custs)
+                pri = rng.choices(ticket_priorities, weights=[0.1, 0.4, 0.35, 0.15])[0]
+                res = rng.choices(ticket_resolutions, weights=[0.35, 0.45, 0.2])[0]
+                c.execute(
+                    "INSERT INTO support_tickets "
+                    "(customer_id, product_id, created_date, category, priority, resolution_status) "
+                    "VALUES (?,?,?,?,?,?)",
+                    (cu[0], 7, date_str, "product_defect", pri, res),
+                )
+            # Billing issue tickets: evenly spread (red herring / noise)
+            if rng.random() < 0.25:
+                cu = rng.choice(region_custs)
+                c.execute(
+                    "INSERT INTO support_tickets "
+                    "(customer_id, product_id, created_date, category, priority, resolution_status) "
+                    "VALUES (?,?,?,?,?,?)",
+                    (cu[0], None, date_str, "billing_issue",
+                     rng.choice(ticket_priorities), rng.choice(ticket_resolutions)),
+                )
+            # General inquiry: background noise
+            if rng.random() < 0.35:
+                cu = rng.choice(region_custs)
+                c.execute(
+                    "INSERT INTO support_tickets "
+                    "(customer_id, product_id, created_date, category, priority, resolution_status) "
+                    "VALUES (?,?,?,?,?,?)",
+                    (cu[0], None, date_str, "general_inquiry", "low",
+                     rng.choice(["resolved", "open"])),
+                )
+    # -- Inventory log -------------------------------------------------------
+    # Daily stock levels per product per warehouse region.
+    # Product 4 (Monitor 27-inch) stocks out in West during promo.
+    base_stock = {}
+    for p in PRODUCTS:
+        pid = p[0]
+        if pid in (1, 2, 3, 4, 5):       # electronics — higher stock
+            base_stock[pid] = 200
+        elif pid <= 10:                    # accessories
+            base_stock[pid] = 350
+        elif pid <= 15:                    # software (digital)
+            base_stock[pid] = 9999
+        else:                              # peripherals
+            base_stock[pid] = 250
+    for day in _date_range(START, END):
+        date_str = day.strftime("%Y-%m-%d")
+        is_promo = PROMO_S <= day <= PROMO_E
+        for region_name in REGIONS:
+            for p in PRODUCTS:
+                pid = p[0]
+                stock = base_stock[pid]
+                daily_sold = rng.randint(2, 8)
+                if is_promo:
+                    daily_sold = rng.randint(5, 15)
+                # Product 4 stockout in West during promo
+                if pid == 4 and region_name == "West" and is_promo:
+                    stock = rng.randint(0, 2)
+                    daily_sold = rng.randint(0, 1)
+                else:
+                    stock = max(stock - daily_sold + rng.randint(1, 6), 10)
+                # Product 6 fluctuates in West but never stocks out (red herring)
+                if pid == 6 and region_name == "West":
+                    stock = rng.randint(30, 80)
+                reorder = 0
+                if stock < 20 and pid <= 15:
+                    reorder = rng.randint(50, 100)
+                c.execute(
+                    "INSERT INTO inventory_log "
+                    "(product_id, log_date, units_in_stock, units_ordered, warehouse_region) "
+                    "VALUES (?,?,?,?,?)",
+                    (pid, date_str, stock, reorder, region_name),
+                )
+    # -- Fraudulent accounts ---------------------------------------------------
+    # ~15 fake accounts in Southeast, Consumer, all signed up late Feb,
+    # placing high-value Electronics orders (products 1 & 2).
+    fraud_customers = []
+    for i in range(15):
+        cid = 151 + i
+        first = rng.choice(_FIRST)
+        last = rng.choice(_LAST)
+        name = f"{first} {last}"
+        email = f"{first.lower()}.{last.lower()}{cid}@techmart.com"
+        signup = (datetime(2024, 2, 20) + timedelta(days=rng.randint(0, 7))).strftime("%Y-%m-%d")
+        c.execute("INSERT INTO customers VALUES (?,?,?,?,?,?)",
+                  (cid, name, email, "Southeast", "Consumer", signup))
+        fraud_customers.append(cid)
+        customers.append((cid, name, email, "Southeast", "Consumer", signup))
+    cust_region_map.update({cid: "Southeast" for cid in fraud_customers})
+    FRAUD_ORDER_START = datetime(2024, 2, 25)
+    FRAUD_ORDER_END = datetime(2024, 3, 10)
+    for cid in fraud_customers:
+        n_orders = rng.randint(3, 5)
+        for _ in range(n_orders):
+            order_day = FRAUD_ORDER_START + timedelta(
+                days=rng.randint(0, (FRAUD_ORDER_END - FRAUD_ORDER_START).days))
+            date_str = order_day.strftime("%Y-%m-%d")
+            fraud_pid = rng.choice([1, 2])
+            qty = rng.randint(1, 2)
+            price = _effective_price(base_prices, changes_by_pid, fraud_pid, date_str)
+            is_promo_day = PROMO_S <= order_day <= PROMO_E
+            if is_promo_day:
+                price = round(price * 0.75, 2)
+            total = round(price * qty, 2)
+            c.execute(
+                "INSERT INTO orders (customer_id, order_date, status, total_amount) VALUES (?,?,?,?)",
+                (cid, date_str, "completed", total),
+            )
+            oid = c.lastrowid
+            c.execute(
+                "INSERT INTO order_items (order_id, product_id, quantity, unit_price) VALUES (?,?,?,?)",
+                (oid, fraud_pid, qty, round(price, 2)),
+            )
+            ship_dt = order_day + timedelta(days=rng.randint(0, 1))
+            delivery_dt = ship_dt + timedelta(days=rng.randint(2, 4))
+            c.execute(
+                "INSERT INTO shipping (order_id, carrier, ship_date, delivery_date, status) "
+                "VALUES (?,?,?,?,?)",
+                (oid, "FastFreight", ship_dt.strftime("%Y-%m-%d"),
+                 delivery_dt.strftime("%Y-%m-%d"), "delivered"),
+            )
+    # -- Marketing spend -------------------------------------------------------
+    # Heavy acquisition spend (search_ads, social_media) in Feb/Mar.
+    # Email (retention) drops off after Jan.  Southeast gets a big bump in Feb
+    # (red herring for fraud task).
+    acq_channels = ["search_ads", "social_media", "display_ads", "affiliate"]
+    for day in _date_range(START, END):
+        date_str = day.strftime("%Y-%m-%d")
+        month = day.month
+        for region_name in REGIONS:
+            # Retention channel: email
+            if month == 1:
+                email_spend = round(rng.uniform(200, 400), 2)
+            else:
+                email_spend = round(rng.uniform(20, 60), 2)
+            c.execute(
+                "INSERT INTO marketing_spend (channel, campaign_name, region, spend_date, amount) "
+                "VALUES (?,?,?,?,?)",
+                ("email", "Customer Retention", region_name, date_str, email_spend),
+            )
+            # Acquisition channels
+            for ch in acq_channels:
+                if month == 1:
+                    base_spend = rng.uniform(100, 200)
+                else:
+                    base_spend = rng.uniform(300, 600)
+                if region_name == "Southeast" and month >= 2:
+                    base_spend *= 1.5
+                c.execute(
+                    "INSERT INTO marketing_spend (channel, campaign_name, region, spend_date, amount) "
+                    "VALUES (?,?,?,?,?)",
+                    (ch, "New Customer Acquisition", region_name, date_str,
+                     round(base_spend, 2)),
+                )
+    conn.commit()
+    return conn
+def get_schema_info(conn: sqlite3.Connection) -> str:
+    """Human-readable database schema for the LLM agent."""
+    c = conn.cursor()
+    c.execute("SELECT name FROM sqlite_master WHERE type='table' ORDER BY name")
+    tables = [r[0] for r in c.fetchall()]
+    parts = ["DATABASE SCHEMA", "=" * 50, ""]
+    for table in tables:
+        c.execute(f"SELECT COUNT(*) FROM {table}")
+        count = c.fetchone()[0]
+        parts.append(f"Table: {table}  ({count} rows)")
+        c.execute(f"PRAGMA table_info({table})")
+        for col in c.fetchall():
+            pk = " [PK]" if col[5] else ""
+            parts.append(f"  - {col[1]}  {col[2]}{pk}")
+        if table == "customers":
+            c.execute("SELECT DISTINCT region FROM customers ORDER BY region")
+            parts.append(f"  Regions: {', '.join(r[0] for r in c.fetchall())}")
+            c.execute("SELECT DISTINCT segment FROM customers ORDER BY segment")
+            parts.append(f"  Segments: {', '.join(r[0] for r in c.fetchall())}")
+        elif table == "products":
+            c.execute("SELECT DISTINCT category FROM products ORDER BY category")
+            parts.append(f"  Categories: {', '.join(r[0] for r in c.fetchall())}")
+            c.execute("SELECT DISTINCT supplier FROM products ORDER BY supplier")
+            parts.append(f"  Suppliers: {', '.join(r[0] for r in c.fetchall())}")
+        elif table == "shipping":
+            c.execute("SELECT DISTINCT carrier FROM shipping ORDER BY carrier")
+            parts.append(f"  Carriers: {', '.join(r[0] for r in c.fetchall())}")
+            c.execute("SELECT DISTINCT status FROM shipping ORDER BY status")
+            parts.append(f"  Statuses: {', '.join(r[0] for r in c.fetchall())}")
+        elif table == "support_tickets":
+            c.execute("SELECT DISTINCT category FROM support_tickets ORDER BY category")
+            parts.append(f"  Categories: {', '.join(r[0] for r in c.fetchall())}")
+            c.execute("SELECT DISTINCT priority FROM support_tickets ORDER BY priority")
+            parts.append(f"  Priorities: {', '.join(r[0] for r in c.fetchall())}")
+        elif table == "inventory_log":
+            c.execute("SELECT DISTINCT warehouse_region FROM inventory_log ORDER BY warehouse_region")
+            parts.append(f"  Warehouse regions: {', '.join(r[0] for r in c.fetchall())}")
+        elif table == "marketing_spend":
+            c.execute("SELECT DISTINCT channel FROM marketing_spend ORDER BY channel")
+            parts.append(f"  Channels: {', '.join(r[0] for r in c.fetchall())}")
+            c.execute("SELECT DISTINCT campaign_name FROM marketing_spend ORDER BY campaign_name")
+            parts.append(f"  Campaigns: {', '.join(r[0] for r in c.fetchall())}")
+        parts.append("")
+    parts += [
+        "=" * 50,
+        "Data spans: 2024-01-01 to 2024-03-15",
+        "All dates stored as YYYY-MM-DD text.",
+        "Use standard SQLite functions (strftime, date, etc.).",
+    ]
+    return "\n".join(parts)

server/environment.py ADDED Viewed

	@@ -0,0 +1,192 @@

+"""Core environment logic for DataDetective."""
+import random
+import uuid
+from typing import Any, Optional
+from openenv.core.env_server import Environment
+try:
+    from ..models import DataDetectiveAction, DataDetectiveObservation, DataDetectiveState
+    from .database import create_database, get_schema_info
+    from .tasks import TASKS, grade_answer
+except (ImportError, ModuleNotFoundError):
+    from models import DataDetectiveAction, DataDetectiveObservation, DataDetectiveState
+    from server.database import create_database, get_schema_info
+    from server.tasks import TASKS, grade_answer
+class DataDetectiveEnvironment(
+    Environment[DataDetectiveAction, DataDetectiveObservation, DataDetectiveState]
+):
+    SUPPORTS_CONCURRENT_SESSIONS = True
+    MAX_STEPS = 30
+    def __init__(self):
+        super().__init__()
+        self._db = None
+        self._task_id: str = ""
+        self._step_count: int = 0
+        self._episode_id: str = ""
+        self._queries_executed: int = 0
+        self._state = DataDetectiveState()
+    def reset(
+        self,
+        seed: Optional[int] = None,
+        episode_id: Optional[str] = None,
+        task_id: Optional[str] = None,
+        **kwargs: Any,
+    ) -> DataDetectiveObservation:
+        if seed is not None:
+            random.seed(seed)
+        self._episode_id = episode_id or str(uuid.uuid4())
+        self._task_id = task_id if task_id in TASKS else random.choice(list(TASKS))
+        self._step_count = 0
+        self._queries_executed = 0
+        if self._db is not None:
+            self._db.close()
+        self._db = create_database()
+        task = TASKS[self._task_id]
+        schema = get_schema_info(self._db)
+        self._state = DataDetectiveState(
+            episode_id=self._episode_id,
+            step_count=0,
+            task_id=self._task_id,
+            queries_executed=0,
+            max_steps=self.MAX_STEPS,
+        )
+        return DataDetectiveObservation(
+            done=False,
+            reward=None,
+            output="Environment ready. Run SQL queries to investigate the issue, then submit your answer.",
+            task_description=task["description"],
+            schema_info=schema,
+            step_number=0,
+            max_steps=self.MAX_STEPS,
+            message=f"Investigation: {task['title']} [{task['difficulty'].upper()}]  --  {self.MAX_STEPS} steps available.",
+        )
+    def step(
+        self,
+        action: DataDetectiveAction,
+        timeout_s: Optional[float] = None,
+        **kwargs: Any,
+    ) -> DataDetectiveObservation:
+        self._step_count += 1
+        self._state.step_count = self._step_count
+        remaining = self.MAX_STEPS - self._step_count
+        if self._step_count > self.MAX_STEPS:
+            return self._obs(
+                done=True, reward=0.0,
+                output="Maximum steps reached -- investigation ended with no answer submitted.",
+                message="Out of steps.",
+            )
+        atype = (action.action_type or "").strip().lower()
+        if atype == "query":
+            return self._handle_query(action.content, remaining)
+        elif atype == "answer":
+            return self._handle_answer(action.content)
+        else:
+            return self._obs(
+                done=False, reward=0.0,
+                output="",
+                message=f"Unknown action_type '{action.action_type}'. Use 'query' or 'answer'.  ({remaining} steps left)",
+            )
+    @property
+    def state(self) -> DataDetectiveState:
+        return self._state
+    def close(self) -> None:
+        if self._db is not None:
+            self._db.close()
+            self._db = None
+    def _obs(self, *, done: bool, reward: float | None, output: str, message: str) -> DataDetectiveObservation:
+        return DataDetectiveObservation(
+            done=done,
+            reward=reward,
+            output=output,
+            task_description=TASKS[self._task_id]["description"],
+            schema_info="",
+            step_number=self._step_count,
+            max_steps=self.MAX_STEPS,
+            message=message,
+        )
+    def _handle_query(self, sql: str, remaining: int) -> DataDetectiveObservation:
+        self._queries_executed += 1
+        self._state.queries_executed = self._queries_executed
+        if not sql or not sql.strip():
+            return self._obs(
+                done=False, reward=0.0,
+                output="Empty query -- please provide a valid SQL statement.",
+                message=f"{remaining} steps left.",
+            )
+        try:
+            cur = self._db.cursor()
+            cur.execute(sql)
+            columns = [d[0] for d in cur.description] if cur.description else []
+            rows = cur.fetchall()
+            output = _format_table(columns, rows) if rows else "Query returned 0 rows."
+        except Exception as exc:
+            output = f"SQL Error: {exc}"
+            return self._obs(
+                done=False, reward=0.0,
+                output=output,
+                message=f"Query failed. Fix your SQL and retry.  ({remaining} steps left)",
+            )
+        return self._obs(
+            done=False, reward=0.0,
+            output=output,
+            message=f"{len(rows)} row(s) returned.  ({remaining} steps left)",
+        )
+    def _handle_answer(self, answer_text: str) -> DataDetectiveObservation:
+        reward = grade_answer(self._task_id, answer_text)
+        if reward >= 0.8:
+            verdict = "Excellent investigation!"
+        elif reward >= 0.5:
+            verdict = "Good findings, but some details missing."
+        else:
+            verdict = "Several key findings were missed."
+        return self._obs(
+            done=True,
+            reward=reward,
+            output=f"Score: {reward:.2f} / 1.00  --  {verdict}",
+            message=f"Investigation complete. Final score: {reward:.2f}",
+        )
+def _format_table(columns: list[str], rows: list, max_rows: int = 100) -> str:
+    truncated = len(rows) > max_rows
+    display = rows[:max_rows]
+    widths = [len(str(c)) for c in columns]
+    for row in display:
+        for i, v in enumerate(row):
+            widths[i] = max(widths[i], min(len(str(v)), 60))
+    header = " | ".join(str(c).ljust(widths[i]) for i, c in enumerate(columns))
+    sep    = "-+-".join("-" * w for w in widths)
+    lines  = [header, sep]
+    for row in display:
+        lines.append(" | ".join(str(v).ljust(widths[i])[:60] for i, v in enumerate(row)))
+    if truncated:
+        lines.append(f"... (showing {max_rows} of {len(rows)} rows)")
+    return "\n".join(lines)

server/requirements.txt ADDED Viewed

	@@ -0,0 +1,6 @@

+openenv-core>=0.2.0
+fastapi>=0.104.0
+uvicorn>=0.24.0
+pydantic>=2.0.0
+websockets>=12.0
+openai>=1.0.0

server/tasks.py ADDED Viewed

	@@ -0,0 +1,408 @@

+"""
+Task definitions and automated graders for the DataDetective environment.
+Each task has:
+  - id, title, difficulty, description
+  - A grader function that scores the agent's final answer (0.0 - 1.0)
+    based on whether key findings are mentioned.
+"""
+import re
+from typing import Callable
+def _has_any(text: str, keywords: list[str]) -> bool:
+    """Case-insensitive check: does *text* contain any of *keywords*?"""
+    low = text.lower()
+    return any(kw.lower() in low for kw in keywords)
+def _has_pattern(text: str, pattern: str) -> bool:
+    return bool(re.search(pattern, text, re.IGNORECASE))
+def _grade_orders_drop(answer: str) -> float:
+    score = 0.0
+    if _has_any(answer, ["drop", "decrease", "decline", "fell", "fewer", "reduction", "lower"]):
+        score += 0.20
+    if _has_any(answer, ["spring mega sale", "spring sale", "mega sale"]) or (
+        _has_any(answer, ["promotion", "promo", "sale", "discount", "campaign"])
+    ):
+        score += 0.20
+    if _has_any(answer, ["ended", "expired", "over", "concluded", "stopped"]) or _has_pattern(
+        answer, r"march\s*0?1"
+    ):
+        score += 0.20
+    if _has_any(answer, [
+        "caused", "because", "due to", "result of", "led to",
+        "when the", "after the", "ending of", "end of the",
+        "correlated", "explains",
+    ]):
+        score += 0.20
+    if _has_pattern(answer, r"\d+\s*(orders|transactions)") or _has_pattern(
+        answer, r"\d+\s*%"
+    ) or _has_pattern(answer, r"from\s+\d+.*to\s+\d+"):
+        score += 0.20
+    return min(score, 1.0)
+def _grade_returns_spike(answer: str) -> float:
+    score = 0.0
+    if _has_any(answer, ["wireless headphones", "headphones pro", "headphone"]):
+        score += 0.20
+    if _has_any(answer, ["west"]):
+        score += 0.20
+    if _has_any(answer, ["audiotech", "audio tech"]):
+        score += 0.20
+    if _has_any(answer, [
+        "defect", "defective", "faulty", "quality",
+        "high return", "return rate", "abnormal",
+        "stopped working", "battery issue", "poor audio",
+    ]):
+        score += 0.20
+    if _has_pattern(answer, r"\d+\s*%") or _has_pattern(
+        answer, r"\d+\s*(returns|returned|units)"
+    ) or _has_any(answer, ["return rate", "compared to"]):
+        score += 0.20
+    return min(score, 1.0)
+def _grade_customer_churn(answer: str) -> float:
+    score = 0.0
+    if _has_pattern(answer, r"\d+\s*%") or _has_any(answer, [
+        "decline", "decrease", "drop", "churn", "fewer active",
+        "lost customers", "stopped ordering",
+    ]):
+        score += 0.20
+    if _has_any(answer, ["enterprise"]):
+        score += 0.20
+    if _has_any(answer, ["northeast", "north east", "north-east"]):
+        score += 0.20
+    if _has_any(answer, [
+        "price increase", "price change", "price hike", "pricing",
+        "more expensive", "raised price", "cost increase",
+    ]):
+        score += 0.20
+    if _has_any(answer, [
+        "laptop pro", "desktop workstation", "office suite",
+        "devtools", "external ssd",
+    ]) or _has_pattern(answer, r"product.*(1|2|11|15|19)"):
+        score += 0.20
+    return min(score, 1.0)
+def _grade_shipping_delay(answer: str) -> float:
+    score = 0.0
+    if _has_any(answer, ["midwest"]):
+        score += 0.20
+    if _has_any(answer, ["quickship", "quick ship"]):
+        score += 0.20
+    if _has_any(answer, [
+        "delivery delay", "late delivery", "delayed shipment",
+        "shipping delay", "late shipment", "delivery time",
+        "delayed delivery", "slow delivery",
+    ]):
+        score += 0.20
+    if _has_pattern(answer, r"feb(ruary)?\s*(10|mid|middle)") or _has_any(answer, [
+        "mid-february", "mid february", "around february",
+        "starting in february", "beginning of february",
+    ]):
+        score += 0.20
+    if _has_any(answer, [
+        "support ticket", "complaint", "ticket volume",
+        "customer satisfaction", "support request",
+    ]) and _has_any(answer, [
+        "delivery", "shipping", "carrier", "quickship",
+    ]):
+        score += 0.20
+    return min(score, 1.0)
+def _grade_revenue_paradox(answer: str) -> float:
+    score = 0.0
+    if _has_any(answer, [
+        "spring mega sale", "mega sale", "25%", "25 percent",
+    ]) or (
+        _has_any(answer, ["promotion", "promo", "discount", "sale"])
+        and _has_any(answer, ["margin", "profit", "cost"])
+    ):
+        score += 0.20
+    if _has_any(answer, [
+        "product mix", "category mix", "mix shift", "shifted toward",
+        "higher proportion", "more electronics", "low-margin",
+        "composition changed",
+    ]):
+        score += 0.20
+    if _has_any(answer, ["enterprise"]) and _has_any(answer, [
+        "price increase", "price change", "price hike",
+        "lost", "churn", "left", "fewer", "decline",
+    ]):
+        score += 0.20
+    if _has_any(answer, ["return", "refund"]) and _has_any(answer, [
+        "cost", "expense", "profit", "margin", "loss", "erode",
+    ]):
+        score += 0.20
+    if _has_pattern(answer, r"\$\s*[\d,]+") or _has_pattern(
+        answer, r"\d+\s*%"
+    ) or _has_pattern(answer, r"from\s+\$?[\d,]+.*to\s+\$?[\d,]+"):
+        score += 0.20
+    return min(score, 1.0)
+def _grade_supplier_quality(answer: str) -> float:
+    score = 0.0
+    if _has_any(answer, ["audiotech", "audio tech"]):
+        score += 0.20
+    if _has_any(answer, ["wireless headphones", "headphones pro", "product 6"]):
+        score += 0.20
+    if _has_any(answer, ["bluetooth speaker", "product 7"]):
+        score += 0.20
+    if _has_any(answer, ["return rate", "refund", "return volume"]) or _has_pattern(
+        answer, r"\d+\s*%.*return"
+    ) or _has_pattern(answer, r"return.*\d+\s*%") or _has_pattern(
+        answer, r"\$\s*[\d,]+"
+    ):
+        score += 0.20
+    if _has_any(answer, [
+        "support ticket", "defect", "complaint", "product_defect",
+        "quality issue", "customer complaint",
+    ]):
+        score += 0.20
+    return min(score, 1.0)
+def _grade_inventory_stockout(answer: str) -> float:
+    score = 0.0
+    if _has_any(answer, ["west"]):
+        score += 0.20
+    if _has_any(answer, ["monitor", "product 4", "monitor 27"]):
+        score += 0.20
+    if _has_any(answer, [
+        "inventory", "stock", "out of stock", "stockout", "stock-out",
+        "zero units", "no inventory", "warehouse",
+    ]):
+        score += 0.20
+    if _has_any(answer, [
+        "spring mega sale", "mega sale", "promo", "promotion",
+        "february 15", "feb 15", "during the sale",
+    ]):
+        score += 0.20
+    if _has_pattern(answer, r"\d+\s*(units|orders|sales)") or _has_pattern(
+        answer, r"\d+\s*%"
+    ) or _has_pattern(answer, r"from\s+\d+.*to\s+\d+"):
+        score += 0.20
+    return min(score, 1.0)
+def _grade_fraud_detection(answer: str) -> float:
+    score = 0.0
+    if _has_any(answer, ["southeast"]):
+        score += 0.20
+    if _has_any(answer, [
+        "new account", "recent signup", "recently created",
+        "new customer", "account creation", "registered in feb",
+        "signed up",
+    ]):
+        score += 0.20
+    if _has_any(answer, [
+        "high-value", "high value", "expensive", "laptop pro",
+        "desktop workstation", "large order", "electronics",
+    ]):
+        score += 0.20
+    if _has_pattern(answer, r"1[0-5]\s*(account|customer|user)") or _has_pattern(
+        answer, r"\$\s*[\d,]+"
+    ) or _has_pattern(answer, r"\d+\s*(order|transaction)"):
+        score += 0.20
+    if _has_any(answer, [
+        "pattern", "cluster", "coordinated", "suspicious",
+        "same product", "no return", "never returned",
+        "concentrated", "anomal", "fraud ring",
+    ]):
+        score += 0.20
+    return min(score, 1.0)
+def _grade_repeat_purchase_decline(answer: str) -> float:
+    score = 0.0
+    if _has_any(answer, [
+        "repeat purchase", "repeat rate", "returning customer",
+        "repeat buyer", "repurchase", "order frequency",
+        "second order", "came back",
+    ]) and (_has_pattern(answer, r"\d+\s*%") or _has_any(answer, [
+        "decline", "drop", "decrease", "fell", "collapsed",
+    ])):
+        score += 0.20
+    if _has_any(answer, ["enterprise"]) and _has_any(answer, [
+        "price", "increase", "hike", "stopped", "left", "churn",
+    ]):
+        score += 0.20
+    if (_has_any(answer, ["midwest"]) or _has_any(answer, [
+        "shipping", "delivery", "quickship",
+    ])) and _has_any(answer, [
+        "repeat", "return", "reorder", "come back", "second order",
+    ]):
+        score += 0.20
+    if _has_any(answer, ["marketing", "acquisition", "spend"]) and _has_any(answer, [
+        "retention", "email", "loyalty", "re-engage", "lapsed",
+        "shifted", "new customer",
+    ]):
+        score += 0.20
+    if _has_any(answer, [
+        "segment", "cohort", "by region", "by segment",
+        "enterprise vs", "consumer vs", "smb vs",
+    ]) or _has_pattern(answer, r"(enterprise|smb|consumer).*\d+\s*%"):
+        score += 0.20
+    return min(score, 1.0)
+TASKS: dict[str, dict] = {
+    "orders_drop": {
+        "id": "orders_drop",
+        "difficulty": "easy",
+        "title": "Weekly Orders Drop Investigation",
+        "description": (
+            "URGENT -- Our order volume dropped sharply in the first two weeks "
+            "of March compared to the last two weeks of February. Leadership "
+            "needs to know why.\n\n"
+            "Investigate the database, identify the root cause of the drop, "
+            "and submit a clear summary of your findings."
+        ),
+    },
+    "returns_spike": {
+        "id": "returns_spike",
+        "difficulty": "medium",
+        "title": "Product Returns Spike Investigation",
+        "description": (
+            "ALERT -- Our return rate has spiked significantly in recent weeks, "
+            "with particular concentration in one geographic region. This is "
+            "eating into margins.\n\n"
+            "Use the database to identify which product(s) are driving the "
+            "spike, which region is most affected, and what the likely root "
+            "cause is. Include the supplier if relevant."
+        ),
+    },
+    "customer_churn": {
+        "id": "customer_churn",
+        "difficulty": "hard",
+        "title": "Customer Churn Root Cause Analysis",
+        "description": (
+            "CRITICAL -- Our monthly active customer count has declined "
+            "significantly from January to March. The executive team wants a "
+            "full root-cause analysis.\n\n"
+            "Determine which customer segments and regions are most affected, "
+            "quantify the decline, and identify the most likely causes. "
+            "Check all available tables for clues."
+        ),
+    },
+    "shipping_delay": {
+        "id": "shipping_delay",
+        "difficulty": "medium-hard",
+        "title": "Customer Satisfaction Crisis Investigation",
+        "description": (
+            "ESCALATION -- Customer satisfaction scores have plummeted in one "
+            "of our regions. The support team is overwhelmed with complaints "
+            "and escalations are piling up.\n\n"
+            "Investigate what operational issue is driving the complaints, "
+            "identify the responsible party (carrier, warehouse, etc.), "
+            "determine when the problem started, and quantify the impact. "
+            "Cross-reference multiple data sources for a complete picture."
+        ),
+    },
+    "revenue_paradox": {
+        "id": "revenue_paradox",
+        "difficulty": "hard",
+        "title": "Revenue vs. Profit Paradox Investigation",
+        "description": (
+            "CRITICAL -- Revenue in February was our highest month ever, yet "
+            "gross profit actually *decreased* compared to January. The CFO "
+            "wants a full breakdown of why we are selling more but earning "
+            "less.\n\n"
+            "Analyze revenue, costs, margins, discounts, product mix, customer "
+            "segments, and any other relevant factors. This is likely multi-"
+            "causal -- identify ALL contributing factors and quantify their "
+            "impact. Use the products.cost column to compute margins."
+        ),
+    },
+    "supplier_quality": {
+        "id": "supplier_quality",
+        "difficulty": "medium",
+        "title": "Supplier Quality Crisis Investigation",
+        "description": (
+            "ESCALATION -- The VP of Merchandising has received escalating "
+            "complaints about product quality across multiple SKUs. Quality "
+            "Assurance wants a supplier-level analysis.\n\n"
+            "Determine which supplier(s) have systemic quality issues, which "
+            "of their products are affected, and quantify the total business "
+            "impact in returns, refunds, and support ticket volume. Include "
+            "return rates by supplier to support a contract renegotiation."
+        ),
+    },
+    "inventory_stockout": {
+        "id": "inventory_stockout",
+        "difficulty": "medium-hard",
+        "title": "Regional Sales Underperformance Investigation",
+        "description": (
+            "INVESTIGATION -- Our West region was projected to be the top "
+            "performer during the Spring Mega Sale based on historical trends "
+            "and marketing investment, but actual sales came in significantly "
+            "below the other regions.\n\n"
+            "The Regional VP demands an explanation. Investigate what caused "
+            "the West to underperform during our biggest promotional event. "
+            "Check product-level sales, inventory data, and any operational "
+            "issues that may have limited fulfillment."
+        ),
+    },
+    "fraud_detection": {
+        "id": "fraud_detection",
+        "difficulty": "hard",
+        "title": "Suspicious Order Pattern Investigation",
+        "description": (
+            "ALERT -- The Finance team has flagged a suspicious spike in "
+            "high-value orders from recently created accounts. Several of "
+            "these orders have already shipped.\n\n"
+            "Investigate the pattern: identify the suspicious accounts, "
+            "determine the scope of potential fraud, estimate the financial "
+            "exposure, and describe the behavioral signatures that "
+            "distinguish these accounts from legitimate customers. Look at "
+            "signup dates, order values, product choices, and geographic "
+            "concentration."
+        ),
+    },
+    "repeat_purchase_decline": {
+        "id": "repeat_purchase_decline",
+        "difficulty": "hard",
+        "title": "Customer Retention Crisis Investigation",
+        "description": (
+            "CRITICAL -- Monthly unique buyer count has held steady around "
+            "100, but the Customer Success team reports that repeat purchase "
+            "rates have collapsed. In January, roughly 40%% of orders came "
+            "from returning customers; by March, it appears to be under 20%%."
+            "\n\n"
+            "The CEO asks: are we becoming a one-time-purchase business? "
+            "Diagnose which customer segments and regions lost repeat buyers, "
+            "identify the root causes, and determine whether our marketing "
+            "spend strategy is masking a retention problem. Check the "
+            "marketing_spend table for clues about acquisition vs. retention "
+            "investment."
+        ),
+    },
+}
+_GRADERS: dict[str, Callable[[str], float]] = {
+    "orders_drop":             _grade_orders_drop,
+    "returns_spike":           _grade_returns_spike,
+    "customer_churn":          _grade_customer_churn,
+    "shipping_delay":          _grade_shipping_delay,
+    "revenue_paradox":         _grade_revenue_paradox,
+    "supplier_quality":        _grade_supplier_quality,
+    "inventory_stockout":      _grade_inventory_stockout,
+    "fraud_detection":         _grade_fraud_detection,
+    "repeat_purchase_decline": _grade_repeat_purchase_decline,
+}
+def grade_answer(task_id: str, answer: str) -> float:
+    grader = _GRADERS.get(task_id)
+    if grader is None:
+        return 0.0
+    return grader(answer)