--- title: DataDetective emoji: 🔍 colorFrom: blue colorTo: green sdk: docker app_port: 7860 --- # DataDetective — Business Incident Investigation Environment An [OpenEnv](https://github.com/meta-pytorch/OpenEnv) environment where AI agents investigate real-world business incidents by querying a SQL database, analysing patterns, and submitting root-cause findings. ## What It Does The agent is given a realistic company database (TechMart — a mid-size B2B+B2C electronics retailer) and a business problem to investigate. It can execute SQL queries to explore the data, then submit a final written analysis. The environment automatically grades the analysis based on whether key findings were identified. Each task has 5 grading criteria worth 0.20 each, enabling meaningful partial credit. ## Tasks (Easy → Hard) | # | Task ID | Difficulty | Scenario | |---|---------|-----------|----------| | 1 | `orders_drop` | Easy | Order volume dropped sharply after promo ended | | 2 | `returns_spike` | Medium | Product returns spiking in West region (defective SKU) | | 3 | `supplier_quality` | Medium | Supplier-level quality crisis across multiple products | | 4 | `shipping_delay` | Medium-Hard | Customer satisfaction crisis from carrier delays | | 5 | `inventory_stockout` | Medium-Hard | Regional sales underperformance from warehouse stockout | | 6 | `customer_churn` | Hard | Active customer decline across segments post price hike | | 7 | `revenue_paradox` | Hard | Revenue up but profit down — multi-causal margin erosion | | 8 | `fraud_detection` | Hard | Coordinated fraud ring with fake accounts | | 9 | `repeat_purchase_decline` | Hard | Repeat purchase collapse masked by acquisition spend | Each task is scored 0.0 – 1.0 based on specific findings the agent must discover. ## Action / Observation Spaces ### Action (`DataDetectiveAction`) | Field | Type | Description | |-------|------|-------------| | `action_type` | `str` | `"query"` to run SQL, `"answer"` to submit findings | | `content` | `str` | SQL query string or final analysis text | ### Observation (`DataDetectiveObservation`) | Field | Type | Description | |-------|------|-------------| | `output` | `str` | Query results (formatted table) or feedback | | `task_description` | `str` | The investigation task | | `schema_info` | `str` | Database schema (shown at reset) | | `step_number` | `int` | Current step | | `max_steps` | `int` | Maximum steps allowed (30) | | `message` | `str` | Status message | ## Database Schema (11 Tables) The TechMart database includes: | Table | Description | |-------|-------------| | `customers` | Customer demographics (region, segment, signup date) | | `products` | Product catalog (category, price, cost, supplier) | | `orders` | Order history with totals | | `order_items` | Line items with quantity and unit price | | `returns` | Product returns with reasons and refund amounts | | `promotions` | Promotional campaigns with discount percentages | | `price_changes` | Historical price adjustments | | `shipping` | Shipment records with carrier and delivery dates | | `support_tickets` | Customer support tickets by category and priority | | `inventory_log` | Daily stock levels per product per warehouse region | | `marketing_spend` | Daily marketing spend by channel, campaign, and region | All data is synthetic, generated in-memory (no external databases required). ## Quick Start ### 1. Install Dependencies ```bash pip install -r requirements.txt ``` ### 2. Start the Server ```bash uvicorn server.app:app --host 0.0.0.0 --port 7860 ``` ### 3. Health Check ```bash curl http://localhost:7860/health ``` ### 4. Run the Baseline Agent ```bash API_BASE_URL="https://router.huggingface.co/v1" \ MODEL_NAME="gpt-4.1-mini" \ HF_TOKEN="hf_..." \ python inference.py ``` ### 5. Docker ```bash docker build -t data-detective . docker run -p 7860:7860 data-detective ``` ## Environment Variables | Env Var | Purpose | Required | |---------|---------|----------| | `API_BASE_URL` | LLM endpoint URL | Yes | | `MODEL_NAME` | Model identifier | Yes | | `HF_TOKEN` | API key / HF token | Yes | | `ENV_URL` | Environment server URL | No (default: `http://localhost:7860`) | ## How Grading Works Each task has an automated grader that checks the agent's final answer for specific key findings (keywords, patterns, named entities). Each task has 5 grading criteria worth 0.20 each, for a maximum score of 1.0. Partial credit is awarded for each finding discovered. ## Setup Requirements - Python 3.10+ - No GPU required - Runs within 2 vCPU / 8 GB memory - All data is generated in-memory (no external databases)