Support Triage OpenEnv

A complete, real-world OpenEnv environment for training/evaluating agents on customer support ticket triage. The environment simulates what support teams actually do: read inbox tickets, classify urgency/category, draft safe responses, and resolve the right ticket.

Why this environment

Most agent benchmarks under-model production support workflows. This environment focuses on practical support operations with:

  • Multi-ticket inbox context selection
  • Policy-compliant communication
  • Priority + escalation decisions
  • Deterministic graders and dense reward shaping

OpenEnv API compliance

The environment exposes:

  • reset(task_id?: str) -> Observation
  • step(action: Action) -> (Observation, Reward, done, info)
  • state() -> dict

Typed Pydantic models:

Metadata:

  • openenv.yaml

Action space

Action model fields:

  • action_type: one of read_ticket | classify_ticket | draft_reply | resolve_ticket
  • ticket_id: required for read_ticket, classify_ticket, resolve_ticket
  • priority: optional enum low | medium | high | urgent
  • category: optional enum account | billing | technical | abuse | general
  • needs_escalation: optional bool
  • message: text for draft_reply

Observation space

Observation includes:

  • task_id, objective, step_count, max_steps
  • inbox: ticket metadata list (ticket_id, subject, tier, age, read flag)
  • current_ticket_content: only visible after reading selected ticket
  • latest_system_note: feedback from last step
  • score_hint: partial grader components (read, classify, reply, resolve)

Tasks and difficulty

  1. easy_password_reset (Easy)
  • Correctly process account lockout and send secure reset guidance.
  1. medium_billing_dispute (Medium)
  • Investigate duplicate billing with context ticket and provide policy-compliant refund timeline.
  1. hard_outage_incident (Hard)
  • Handle a high-stakes outage report requiring multi-ticket context, urgent escalation, and careful incident messaging.

Each task has deterministic grading in support_triage_openenv.graders.grade_task, returning a score 0.0-1.0.

Reward design

Reward is shaped and meaningful across the trajectory:

  • Positive dense signal from partial grader progress (read/context, classification fields, reply quality, resolve correctness)
  • Penalties for invalid actions, repeated loops, and malformed steps
  • Final step guarantees score alignment with deterministic grader output

Project structure

  • src/support_triage_openenv/env.py - environment implementation
  • src/support_triage_openenv/models.py - typed OpenEnv models
  • src/support_triage_openenv/tasks.py - task specs (easy/medium/hard)
  • src/support_triage_openenv/graders.py - deterministic grader logic
  • scripts/run_baseline.py - OpenAI baseline inference runner
  • scripts/validate_env.py - tests + optional openenv validate
  • app.py - FastAPI app for HF Space runtime
  • Dockerfile - containerized deployment

Setup

cd /home/ai24mtech14005/meta_hackathon
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Run tests

python -m pytest -q

Run baseline

OpenAI model baseline:

export API_BASE_URL=https://your-openai-compatible-endpoint/v1
export MODEL_NAME=your-model-id
export HF_TOKEN=your-api-key
python inference.py --mode openai --output scores/inference_scores.json

Deterministic heuristic baseline:

python inference.py --mode heuristic --output scores/inference_scores.json

Outputs JSON report to scores/inference_scores.json and structured stdout logs with [START], [STEP], [END].

Run API locally

uvicorn app:app --host 0.0.0.0 --port 7860

Endpoints:

  • GET /health
  • POST /reset
  • POST /step
  • GET /state

Docker

docker build -t support-triage-openenv .
docker run --rm -p 7860:7860 support-triage-openenv

Hugging Face Space deployment

  • Create a Docker Space.
  • Push this repository to the Space.
  • Keep README.md frontmatter tags including openenv.
  • Space serves the API on port 7860.

One-command remote bootstrap

If you want this local repo to automatically create and push to both GitHub + HF:

export GITHUB_USERNAME=your_github_user
export GITHUB_TOKEN=your_github_pat
export HF_USERNAME=your_hf_user
export HF_TOKEN=your_hf_token
bash scripts/bootstrap_remotes.sh support-triage-openenv

Baseline scores (heuristic reproducible)

Generated with:

python inference.py --mode heuristic --output scores/inference_scores.json
  • easy_password_reset: grader 1.0, reward 1.0
  • medium_billing_dispute: grader 1.0, reward 1.0
  • hard_outage_incident: grader 1.0, reward 1.0
  • Overall average grader score: 1.0
  • Tracked reference artifact: baseline_expected_scores.json

Pre-submission validator

Run full strict validation (all disqualification gates):

python pre_submission_validate.py --space-url https://your-space-name.hf.space

Local-only run while iterating (skips Docker daemon + remote space ping):

python pre_submission_validate.py --skip-docker --skip-space

Run organizer-provided script directly (integrated path):

bash scripts/pre_validation_script.sh https://your-space-name.hf.space .

Notes:

  • scripts/sample_inference_script.sh is kept as organizer reference.
  • Root inference.py is aligned to the required [START], [STEP], [END] line format.
Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading