DataDetective / README.md
Viani's picture
Deploy DataDetective: 9-task business investigation environment
bcd8636 verified
metadata
title: DataDetective
emoji: πŸ”
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860

DataDetective β€” Business Incident Investigation Environment

An OpenEnv environment where AI agents investigate real-world business incidents by querying a SQL database, analysing patterns, and submitting root-cause findings.

What It Does

The agent is given a realistic company database (TechMart β€” a mid-size B2B+B2C electronics retailer) and a business problem to investigate. It can execute SQL queries to explore the data, then submit a final written analysis. The environment automatically grades the analysis based on whether key findings were identified. Each task has 5 grading criteria worth 0.20 each, enabling meaningful partial credit.

Tasks (Easy β†’ Hard)

# Task ID Difficulty Scenario
1 orders_drop Easy Order volume dropped sharply after promo ended
2 returns_spike Medium Product returns spiking in West region (defective SKU)
3 supplier_quality Medium Supplier-level quality crisis across multiple products
4 shipping_delay Medium-Hard Customer satisfaction crisis from carrier delays
5 inventory_stockout Medium-Hard Regional sales underperformance from warehouse stockout
6 customer_churn Hard Active customer decline across segments post price hike
7 revenue_paradox Hard Revenue up but profit down β€” multi-causal margin erosion
8 fraud_detection Hard Coordinated fraud ring with fake accounts
9 repeat_purchase_decline Hard Repeat purchase collapse masked by acquisition spend

Each task is scored 0.0 – 1.0 based on specific findings the agent must discover.

Action / Observation Spaces

Action (DataDetectiveAction)

Field Type Description
action_type str "query" to run SQL, "answer" to submit findings
content str SQL query string or final analysis text

Observation (DataDetectiveObservation)

Field Type Description
output str Query results (formatted table) or feedback
task_description str The investigation task
schema_info str Database schema (shown at reset)
step_number int Current step
max_steps int Maximum steps allowed (30)
message str Status message

Database Schema (11 Tables)

The TechMart database includes:

Table Description
customers Customer demographics (region, segment, signup date)
products Product catalog (category, price, cost, supplier)
orders Order history with totals
order_items Line items with quantity and unit price
returns Product returns with reasons and refund amounts
promotions Promotional campaigns with discount percentages
price_changes Historical price adjustments
shipping Shipment records with carrier and delivery dates
support_tickets Customer support tickets by category and priority
inventory_log Daily stock levels per product per warehouse region
marketing_spend Daily marketing spend by channel, campaign, and region

All data is synthetic, generated in-memory (no external databases required).

Quick Start

1. Install Dependencies

pip install -r requirements.txt

2. Start the Server

uvicorn server.app:app --host 0.0.0.0 --port 7860

3. Health Check

curl http://localhost:7860/health

4. Run the Baseline Agent

API_BASE_URL="https://router.huggingface.co/v1" \
MODEL_NAME="gpt-4.1-mini" \
HF_TOKEN="hf_..." \
python inference.py

5. Docker

docker build -t data-detective .
docker run -p 7860:7860 data-detective

Environment Variables

Env Var Purpose Required
API_BASE_URL LLM endpoint URL Yes
MODEL_NAME Model identifier Yes
HF_TOKEN API key / HF token Yes
ENV_URL Environment server URL No (default: http://localhost:7860)

How Grading Works

Each task has an automated grader that checks the agent's final answer for specific key findings (keywords, patterns, named entities). Each task has 5 grading criteria worth 0.20 each, for a maximum score of 1.0. Partial credit is awarded for each finding discovered.

Setup Requirements

  • Python 3.10+
  • No GPU required
  • Runs within 2 vCPU / 8 GB memory
  • All data is generated in-memory (no external databases)