Support Triage OpenEnv
A complete, real-world OpenEnv environment for training/evaluating agents on customer support ticket triage. The environment simulates what support teams actually do: read inbox tickets, classify urgency/category, draft safe responses, and resolve the right ticket.
Why this environment
Most agent benchmarks under-model production support workflows. This environment focuses on practical support operations with:
- Multi-ticket inbox context selection
- Policy-compliant communication
- Priority + escalation decisions
- Deterministic graders and dense reward shaping
OpenEnv API compliance
The environment exposes:
reset(task_id?: str) -> Observationstep(action: Action) -> (Observation, Reward, done, info)state() -> dict
Typed Pydantic models:
Observation:src/support_triage_openenv/models.pyAction:src/support_triage_openenv/models.pyReward:src/support_triage_openenv/models.py
Metadata:
openenv.yaml
Action space
Action model fields:
action_type: one ofread_ticket | classify_ticket | draft_reply | resolve_ticketticket_id: required forread_ticket,classify_ticket,resolve_ticketpriority: optional enumlow | medium | high | urgentcategory: optional enumaccount | billing | technical | abuse | generalneeds_escalation: optional boolmessage: text fordraft_reply
Observation space
Observation includes:
task_id,objective,step_count,max_stepsinbox: ticket metadata list (ticket_id, subject, tier, age, read flag)current_ticket_content: only visible after reading selected ticketlatest_system_note: feedback from last stepscore_hint: partial grader components (read,classify,reply,resolve)
Tasks and difficulty
easy_password_reset(Easy)
- Correctly process account lockout and send secure reset guidance.
medium_billing_dispute(Medium)
- Investigate duplicate billing with context ticket and provide policy-compliant refund timeline.
hard_outage_incident(Hard)
- Handle a high-stakes outage report requiring multi-ticket context, urgent escalation, and careful incident messaging.
Each task has deterministic grading in support_triage_openenv.graders.grade_task, returning a score 0.0-1.0.
Reward design
Reward is shaped and meaningful across the trajectory:
- Positive dense signal from partial grader progress (read/context, classification fields, reply quality, resolve correctness)
- Penalties for invalid actions, repeated loops, and malformed steps
- Final step guarantees score alignment with deterministic grader output
Project structure
src/support_triage_openenv/env.py- environment implementationsrc/support_triage_openenv/models.py- typed OpenEnv modelssrc/support_triage_openenv/tasks.py- task specs (easy/medium/hard)src/support_triage_openenv/graders.py- deterministic grader logicscripts/run_baseline.py- OpenAI baseline inference runnerscripts/validate_env.py- tests + optionalopenenv validateapp.py- FastAPI app for HF Space runtimeDockerfile- containerized deployment
Setup
cd /home/ai24mtech14005/meta_hackathon
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Run tests
python -m pytest -q
Run baseline
OpenAI model baseline:
export API_BASE_URL=https://your-openai-compatible-endpoint/v1
export MODEL_NAME=your-model-id
export HF_TOKEN=your-api-key
python inference.py --mode openai --output scores/inference_scores.json
Deterministic heuristic baseline:
python inference.py --mode heuristic --output scores/inference_scores.json
Outputs JSON report to scores/inference_scores.json and structured stdout logs with [START], [STEP], [END].
Run API locally
uvicorn app:app --host 0.0.0.0 --port 7860
Endpoints:
GET /healthPOST /resetPOST /stepGET /state
Docker
docker build -t support-triage-openenv .
docker run --rm -p 7860:7860 support-triage-openenv
Hugging Face Space deployment
- Create a Docker Space.
- Push this repository to the Space.
- Keep
README.mdfrontmatter tags includingopenenv. - Space serves the API on port
7860.
One-command remote bootstrap
If you want this local repo to automatically create and push to both GitHub + HF:
export GITHUB_USERNAME=your_github_user
export GITHUB_TOKEN=your_github_pat
export HF_USERNAME=your_hf_user
export HF_TOKEN=your_hf_token
bash scripts/bootstrap_remotes.sh support-triage-openenv
Baseline scores (heuristic reproducible)
Generated with:
python inference.py --mode heuristic --output scores/inference_scores.json
easy_password_reset: grader1.0, reward1.0medium_billing_dispute: grader1.0, reward1.0hard_outage_incident: grader1.0, reward1.0- Overall average grader score:
1.0 - Tracked reference artifact:
baseline_expected_scores.json
Pre-submission validator
Run full strict validation (all disqualification gates):
python pre_submission_validate.py --space-url https://your-space-name.hf.space
Local-only run while iterating (skips Docker daemon + remote space ping):
python pre_submission_validate.py --skip-docker --skip-space
Run organizer-provided script directly (integrated path):
bash scripts/pre_validation_script.sh https://your-space-name.hf.space .
Notes:
scripts/sample_inference_script.shis kept as organizer reference.- Root
inference.pyis aligned to the required[START],[STEP],[END]line format.