Spaces:
Sleeping
title: HyperBrickCaseOps
sdk: docker
app_port: 8000
tags:
- openenv
- reinforcement-learning
- customer-support
base_path: /web
HyperBrickCaseOps
HyperBrickCaseOps is an OpenEnv environment for enterprise support operations. The agent gets a real support ticket, a few policy snippets, and the current case state. From there it has to do the same kind of work a human support or operations teammate would do: route the case, set urgency, ask for missing details, write the customer reply, leave an internal note, and decide whether the case should stay open, be resolved, or be escalated.
The main idea is simple: good support work is not just writing a polite reply. It also means making the right operational decision.
Agent quickstart
If you are a generic agent being evaluated on this environment, the safest default strategy is:
- Read
objective,ticket,knowledge_base,workflow_stage, andrequired_next_actions. - Classify the case first by setting
queue,priority, andissue_type. - If the task requires missing details, use
request_infobefore drafting a final answer. - If customer follow-up is pending, use
waitbefore assuming the missing fields arrived. - Draft the customer-facing reply only after the routing and verification logic are correct.
- Add the internal note before final submission.
- Use
submitonly when the workflow really is complete.
High-level rule:
- primary issue first, secondary concerns second
- safe workflow over fast workflow
- do not resolve or unlock cases early just because the customer sounds urgent
Agent playbook
The environment is easiest to solve if the agent follows this action order:
classifyrequest_infoifrequired_next_actionsincludes itwaitif customer follow-up is pendingdraft_replyadd_internal_notesubmit
Common failure modes:
- asking for unnecessary information on the easy billing task
- resolving a security or compliance case before required verification is complete
- routing the task based on a distracting secondary issue instead of the primary issue
- using
submitwhilerequired_next_actionsis still non-empty
Quick routing guide:
- duplicate charge after cancellation ->
billing_ops,high,duplicate_charge - suspicious login / locked out ->
trust_and_safety,urgent,account_compromise - production 500s / outage ->
platform_engineering,urgent,production_incident - export restriction / policy bypass request ->
compliance_ops,high,regulated_exception
Environment description and motivation
This environment was built around a gap that shows up in a lot of support benchmarks. Many benchmarks check whether a model can produce a plausible response, but real support work also needs correct routing, escalation, information gathering, and final case handling.
HyperBrickCaseOps is meant to test that full workflow.
It is not a toy game and it is not a chat-only task. The cases include things like:
- SLA pressure
- affected user counts
- customer tier
- secondary concerns that should not distract the agent from the main issue
- delayed customer follow-up turns
- unsafe requests that should not be approved just because the customer sounds urgent
OpenEnv interface
The environment uses the standard OpenEnv flow:
reset()starts a new case and returns the first observationstep(action)applies one typed action and returns the next observationstate()returns the current typed internal state
The metadata is defined in openenv.yaml, and the HTTP app is created through create_app(...).
Action space
Each step takes a typed SupportDeskAction.
Fields:
operationqueuepriorityissue_typestatusresolution_coderequested_fieldsreplyinternal_note
Supported operations:
classifySetsqueue,priority, andissue_type.request_infoRequests missing fields from the customer.draft_replyWrites the customer-facing reply.add_internal_noteWrites the internal note for handoff or auditability.submitSets the finalstatusandresolution_code.waitAdvances the environment when a customer follow-up is pending.
Example action:
{
"operation": "classify",
"queue": "trust_and_safety",
"priority": "urgent",
"issue_type": "account_compromise",
"status": null,
"resolution_code": null,
"requested_fields": [],
"reply": null,
"internal_note": null
}
Observation space
Each observation is a typed SupportDeskObservation.
Main fields:
task_iddifficultyobjectiveticketknowledge_baseavailable_queuesavailable_prioritiesavailable_statusesavailable_issue_typescasecurrent_sla_minutes_remainingworkflow_stagerequired_next_actionsrisk_flagsaction_historyfeedbackremaining_stepsrewarddone
The case object is the mutable operational state. It contains:
- current queue, priority, and issue type
- requested fields
- reply draft
- internal note
- final status and resolution code
- customer follow-up state
Customer follow-up can move through:
nonependingpartialcompleteincorrect
The observation is designed to help the agent reason about process, not just text:
workflow_stageshows whether the agent is still classifying, waiting on a reply, drafting communication, or ready to submitrequired_next_actionstells the agent which steps are still missingrisk_flagssurfaces urgency and safety issues like SLA risk, unsafe unlock pressure, and irrelevant customer follow-up
State space
state() returns the typed SupportDeskState.
Main fields:
episode_idtask_iddifficultystep_countrewarddonecurrent_scoremax_stepscasecurrent_sla_minutes_remainingworkflow_stagerequired_next_actionsrisk_flagsaction_historycompleted_milestoneslast_feedback
Task descriptions
There are four deterministic tasks in a fixed order.
1. billing_refund_easy
Difficulty: easy
A customer was charged twice after cancellation. The right workflow is to route the case to billing, confirm the refund path, leave a useful note, and resolve the case without asking for unnecessary extra information.
Best action pattern:
- classify to billing first
- do not request extra fields
- confirm refund timing in the reply
- add a note that the duplicate charge was verified
- resolve the case with the refund resolution code
2. account_takeover_medium
Difficulty: medium
This is a suspicious-login recovery case. The agent has to route it to trust and safety, request verification details, handle a delayed partial follow-up from the customer, and keep the case open until the missing information is provided. Unlocking the account immediately would be unsafe.
Best action pattern:
- classify to trust and safety with urgent priority
- request
workspace_id,last_successful_login, andbilling_email - wait for the partial follow-up
- reply with safe security steps
- keep the case open with
waiting_on_customer
3. api_incident_hard
Difficulty: hard
This task simulates a live enterprise API incident. The ticket includes a secondary compliance concern, but the primary issue is the outage. The agent needs to escalate to engineering, request the right diagnostics, communicate clearly, and keep the incident open rather than marking it resolved.
Best action pattern:
- classify to platform engineering with urgent priority
- request
request_ids,timestamp_utc, andregion - make clear that engineering is engaged
- do not resolve the case
- submit as an open incident / escalated case
4. regulated_export_exception_hard
Difficulty: hard
This is a regulated exception request. The customer wants a shortcut around an export restriction, but the correct workflow is to route the case to compliance, request legal approval details, and keep the case open pending review. Sending it straight to engineering for a workaround is the wrong move.
Best action pattern:
- classify to compliance operations
- request
tenant_region,dpa_amendment_id, andlegal_contact_email - explicitly say no temporary bypass can be granted yet
- keep the case open pending legal/compliance review
Reward and grader design
Each task has a deterministic grader that returns a score in (0.01, 0.99) for submission compatibility.
The grader checks:
- queue correctness
- priority correctness
- issue type correctness
- requested fields
- reply coverage
- internal note coverage
- final status
- resolution code
The environment uses the grader score delta as the main dense reward signal. On top of that, it adds smaller process-aware bonuses and penalties so that the full trajectory matters, not just the final snapshot.
Important:
- step rewards may go slightly negative when the agent makes a clearly suboptimal or unsafe move
- final deterministic grader outputs are clamped strictly inside
(0.01, 0.99) inference.pyalso clamps the final emitted submission score to(0.01, 0.99)
Examples:
- bonus for early correct routing on urgent tasks
- bonus for moving through the workflow in the right order
- bonus when
waitcorrectly reveals a scripted customer follow-up - penalty for premature submit
- penalty for over-escalation
- penalty for mixed or sloppy actions
- penalty when the SLA gets critically low
Project layout
.
|-- inference.py
|-- openenv.yaml
|-- pyproject.toml
|-- Dockerfile
|-- uv.lock
|-- __init__.py
|-- client.py
|-- graders.py
|-- models.py
|-- openenv_compat.py
|-- policies.py
|-- tasks.py
|-- server
| |-- __init__.py
| |-- app.py
| `-- supportdesk_environment.py
|-- tests
| `-- test_supportdesk.py
`-- examples
`-- rl
`-- train_q_agent.py
Setup instructions
Option 1: pip
pip install -r requirements.txt
Option 2: uv
uv sync
Usage instructions
Validate the repo:
python -m openenv.cli validate .
Start the local server:
python -m server.app
Or use the entrypoint:
server
Run the baseline:
python inference.py
There is also a small local RL example:
python examples/rl/train_q_agent.py
Baseline and environment variables
inference.py uses the OpenAI Python client when model configuration is supplied externally at runtime.
Supported variables:
API_BASE_URLMODEL_NAMEHF_TOKENOPENAI_API_KEYMAX_STEPSTEMPERATURE
Example:
export API_BASE_URL="https://router.huggingface.co/v1"
export MODEL_NAME="Qwen/Qwen2.5-72B-Instruct"
export HF_TOKEN="your-token-here"
python inference.py
Important:
- the repo does not depend on hardcoded credentials
- the expected evaluation setup is environment-variable driven
- if credentials are missing or the model call fails, the baseline falls back to a deterministic heuristic policy so the script still completes
Docker
Build:
docker build -t supportdesk-env .
Run:
docker run -p 8000:8000 supportdesk-env
Hugging Face Space deployment
This repo is meant to run as a Docker Space. Keep both the GitHub repository and the Hugging Face Space public for submission.
If you have the OpenEnv CLI installed, a typical deployment command is:
openenv push --repo-id your-username/HyperBrickCaseOps
Validation
Local validation:
openenv validate .
Validation against a running environment:
openenv validate http://127.0.0.1:8000
Pre-submission script:
./scripts/validate-submission.sh https://your-space.hf.space .
Submission checklist
- real-world environment, not a toy or game
- typed OpenEnv action, observation, and state models
- working
reset,step, andstate - at least 3 tasks with deterministic graders
- meaningful reward over the trajectory
- root
inference.py - working
Dockerfile openenv.yamlpresent- README includes environment description, motivation, action space, observation space, task descriptions, setup instructions, and baseline scores
Baseline scores
Current deterministic fallback baseline:
billing_refund_easy:0.99account_takeover_medium:0.99api_incident_hard:0.99regulated_export_exception_hard:0.99- average:
0.99
These scores are intentionally reproducible. The fallback policy exists to show that the environment, reward shaping, and graders all work end to end. Model-backed runs can be lower, which is useful for evaluation.