Spaces:
Sleeping
Sleeping
metadata
title: HyperBrickCaseOps Agent Guide
HyperBrickCaseOps Agent Guide
This environment evaluates real-world customer support triage. Agents must classify the ticket, request missing info when required, draft the customer reply, add an internal note, and submit only when the workflow is complete.
Quick Start (Agent Strategy)
Recommended action order:
classify— setqueue,priority,issue_typerequest_infoifrequired_next_actionsincludes itwaitif the customer follow-up is pendingdraft_replyadd_internal_notesubmit
Environment API
The environment follows the standard OpenEnv API:
reset()-> initial observationstep(action)-> next observation, reward, donestate()-> internal state snapshot
Server entrypoint:
server.app:app
Action Schema
Each step takes a typed SupportDeskAction:
operation:classify|request_info|draft_reply|add_internal_note|submit|waitqueue: string or nullpriority: string or nullissue_type: string or nullstatus: string or nullresolution_code: string or nullrequested_fields: list of stringsreply: string or nullinternal_note: string or null
Observation Highlights
The observation includes:
task_id,difficulty,objectiveticket(customer, tier, region, business impact)knowledge_base(policy snippets)case(current triage state)workflow_stage,required_next_actions,risk_flags
Tasks and Difficulty
There are 4 tasks with increasing difficulty:
billing_refund_easy(easy)account_takeover_medium(medium)api_incident_hard(hard)regulated_export_exception_hard(hard)
Grading and Reward
- Deterministic graders score task completion
- Final scores are clamped to
(0.01, 0.99) - Reward provides dense progress signals across the episode
Routing Guide (High-Level)
- Duplicate charge ->
billing_ops,high,duplicate_charge - Suspicious login ->
trust_and_safety,urgent,account_compromise - Production 500s ->
platform_engineering,urgent,production_incident - Export policy bypass ->
compliance_ops,high,regulated_exception
Required Environment Variables
Baseline inference uses:
API_BASE_URLMODEL_NAMEHF_TOKEN
Mandatory Stdout Format
The inference script must emit exactly:
[START] task=<task_name> env=<benchmark> model=<model_name>
[STEP] step=<n> action=<action_str> reward=<0.00> done=<true|false> error=<msg|null>
[END] success=<true|false> steps=<n> score=<score> rewards=<r1,r2,...,rn>
Rules:
- One
[START]at episode begin - One
[STEP]per env step - One
[END]after episode close rewardandrewardsformatted to 2 decimalsdone/successare lowercase booleans