File size: 13,142 Bytes
181758b
d33da97
181758b
 
 
 
 
 
 
 
 
d33da97
181758b
9d47369
181758b
9d47369
181758b
c6e8802
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9d47369
d33da97
9d47369
181758b
9d47369
181758b
9d47369
181758b
9d47369
 
 
 
 
 
181758b
9d47369
181758b
9d47369
181758b
9d47369
 
 
181758b
9d47369
d33da97
9d47369
181758b
9d47369
181758b
9d47369
181758b
9d47369
181758b
 
 
 
 
 
 
 
 
9d47369
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
181758b
9d47369
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
181758b
 
 
9d47369
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c6e8802
 
 
 
 
 
 
 
9d47369
 
 
181758b
9d47369
181758b
c6e8802
 
 
 
 
 
 
 
9d47369
181758b
9d47369
181758b
9d47369
181758b
c6e8802
 
 
 
 
 
 
 
9d47369
181758b
9d47369
181758b
9d47369
181758b
c6e8802
 
 
 
 
 
 
9d47369
181758b
c6e8802
181758b
9d47369
181758b
9d47369
 
 
 
 
 
 
 
181758b
9d47369
181758b
c6e8802
 
 
 
 
 
9d47369
 
 
 
 
 
 
 
 
 
 
181758b
 
 
 
 
 
9d47369
 
4f129c9
 
 
 
 
 
 
 
181758b
4f129c9
 
181758b
 
9d47369
 
 
181758b
 
9d47369
 
 
181758b
 
 
 
 
9d47369
181758b
 
 
 
 
9d47369
 
 
181758b
 
9d47369
181758b
 
9d47369
181758b
 
4f129c9
181758b
 
9d47369
181758b
 
9d47369
181758b
 
9d47369
181758b
 
9d47369
181758b
 
9d47369
181758b
 
9d47369
181758b
 
9d47369
 
 
 
 
 
 
 
 
 
 
 
 
 
181758b
 
9d47369
fb23488
9d47369
181758b
 
 
9d47369
 
 
 
 
181758b
 
 
9d47369
 
181758b
 
 
 
9d47369
181758b
9d47369
 
 
181758b
9d47369
 
 
181758b
9d47369
181758b
 
d33da97
181758b
 
 
 
9d47369
 
181758b
 
 
 
9d47369
 
 
 
 
 
 
181758b
 
 
 
 
9d47369
181758b
9d47369
 
 
 
 
 
 
 
 
181758b
9d47369
181758b
9d47369
181758b
c6e8802
 
 
 
 
181758b
9d47369
55db2c6
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
---

title: HyperBrickCaseOps
sdk: docker
app_port: 8000
tags:
  - openenv
  - reinforcement-learning
  - customer-support
base_path: /web
---


# HyperBrickCaseOps

HyperBrickCaseOps is an OpenEnv environment for enterprise support operations. The agent gets a real support ticket, a few policy snippets, and the current case state. From there it has to do the same kind of work a human support or operations teammate would do: route the case, set urgency, ask for missing details, write the customer reply, leave an internal note, and decide whether the case should stay open, be resolved, or be escalated.

The main idea is simple: good support work is not just writing a polite reply. It also means making the right operational decision.

## Agent quickstart

If you are a generic agent being evaluated on this environment, the safest default strategy is:

1. Read `objective`, `ticket`, `knowledge_base`, `workflow_stage`, and `required_next_actions`.
2. Classify the case first by setting `queue`, `priority`, and `issue_type`.
3. If the task requires missing details, use `request_info` before drafting a final answer.
4. If customer follow-up is pending, use `wait` before assuming the missing fields arrived.
5. Draft the customer-facing reply only after the routing and verification logic are correct.
6. Add the internal note before final submission.
7. Use `submit` only when the workflow really is complete.

High-level rule:

- primary issue first, secondary concerns second
- safe workflow over fast workflow
- do not resolve or unlock cases early just because the customer sounds urgent

## Agent playbook

The environment is easiest to solve if the agent follows this action order:

- `classify`
- `request_info` if `required_next_actions` includes it
- `wait` if customer follow-up is pending
- `draft_reply`
- `add_internal_note`
- `submit`

Common failure modes:

- asking for unnecessary information on the easy billing task
- resolving a security or compliance case before required verification is complete
- routing the task based on a distracting secondary issue instead of the primary issue
- using `submit` while `required_next_actions` is still non-empty

Quick routing guide:

- duplicate charge after cancellation -> `billing_ops`, `high`, `duplicate_charge`
- suspicious login / locked out -> `trust_and_safety`, `urgent`, `account_compromise`
- production 500s / outage -> `platform_engineering`, `urgent`, `production_incident`
- export restriction / policy bypass request -> `compliance_ops`, `high`, `regulated_exception`

## Environment description and motivation

This environment was built around a gap that shows up in a lot of support benchmarks. Many benchmarks check whether a model can produce a plausible response, but real support work also needs correct routing, escalation, information gathering, and final case handling.

HyperBrickCaseOps is meant to test that full workflow.

It is not a toy game and it is not a chat-only task. The cases include things like:

- SLA pressure
- affected user counts
- customer tier
- secondary concerns that should not distract the agent from the main issue
- delayed customer follow-up turns
- unsafe requests that should not be approved just because the customer sounds urgent

## OpenEnv interface

The environment uses the standard OpenEnv flow:

- `reset()` starts a new case and returns the first observation
- `step(action)` applies one typed action and returns the next observation
- `state()` returns the current typed internal state

The metadata is defined in `openenv.yaml`, and the HTTP app is created through `create_app(...)`.

## Action space

Each step takes a typed `SupportDeskAction`.

Fields:

- `operation`
- `queue`
- `priority`
- `issue_type`
- `status`
- `resolution_code`
- `requested_fields`
- `reply`
- `internal_note`

Supported operations:

- `classify`
  Sets `queue`, `priority`, and `issue_type`.
- `request_info`
  Requests missing fields from the customer.
- `draft_reply`
  Writes the customer-facing reply.
- `add_internal_note`
  Writes the internal note for handoff or auditability.
- `submit`
  Sets the final `status` and `resolution_code`.
- `wait`
  Advances the environment when a customer follow-up is pending.

Example action:

```json

{

  "operation": "classify",

  "queue": "trust_and_safety",

  "priority": "urgent",

  "issue_type": "account_compromise",

  "status": null,

  "resolution_code": null,

  "requested_fields": [],

  "reply": null,

  "internal_note": null

}

```

## Observation space

Each observation is a typed `SupportDeskObservation`.

Main fields:

- `task_id`
- `difficulty`
- `objective`
- `ticket`
- `knowledge_base`
- `available_queues`
- `available_priorities`
- `available_statuses`
- `available_issue_types`
- `case`
- `current_sla_minutes_remaining`
- `workflow_stage`
- `required_next_actions`
- `risk_flags`
- `action_history`
- `feedback`
- `remaining_steps`
- `reward`
- `done`

The `case` object is the mutable operational state. It contains:

- current queue, priority, and issue type
- requested fields
- reply draft
- internal note
- final status and resolution code
- customer follow-up state

Customer follow-up can move through:

- `none`
- `pending`
- `partial`
- `complete`
- `incorrect`

The observation is designed to help the agent reason about process, not just text:

- `workflow_stage` shows whether the agent is still classifying, waiting on a reply, drafting communication, or ready to submit
- `required_next_actions` tells the agent which steps are still missing
- `risk_flags` surfaces urgency and safety issues like SLA risk, unsafe unlock pressure, and irrelevant customer follow-up

## State space

`state()` returns the typed `SupportDeskState`.

Main fields:

- `episode_id`
- `task_id`
- `difficulty`
- `step_count`
- `reward`
- `done`
- `current_score`
- `max_steps`
- `case`
- `current_sla_minutes_remaining`
- `workflow_stage`
- `required_next_actions`
- `risk_flags`
- `action_history`
- `completed_milestones`
- `last_feedback`

## Task descriptions

There are four deterministic tasks in a fixed order.

### 1. `billing_refund_easy`

Difficulty: easy

A customer was charged twice after cancellation. The right workflow is to route the case to billing, confirm the refund path, leave a useful note, and resolve the case without asking for unnecessary extra information.

Best action pattern:

- classify to billing first
- do not request extra fields
- confirm refund timing in the reply
- add a note that the duplicate charge was verified
- resolve the case with the refund resolution code

### 2. `account_takeover_medium`

Difficulty: medium

This is a suspicious-login recovery case. The agent has to route it to trust and safety, request verification details, handle a delayed partial follow-up from the customer, and keep the case open until the missing information is provided. Unlocking the account immediately would be unsafe.

Best action pattern:

- classify to trust and safety with urgent priority
- request `workspace_id`, `last_successful_login`, and `billing_email`
- wait for the partial follow-up
- reply with safe security steps
- keep the case open with `waiting_on_customer`

### 3. `api_incident_hard`

Difficulty: hard

This task simulates a live enterprise API incident. The ticket includes a secondary compliance concern, but the primary issue is the outage. The agent needs to escalate to engineering, request the right diagnostics, communicate clearly, and keep the incident open rather than marking it resolved.

Best action pattern:

- classify to platform engineering with urgent priority
- request `request_ids`, `timestamp_utc`, and `region`
- make clear that engineering is engaged
- do not resolve the case
- submit as an open incident / escalated case

### 4. `regulated_export_exception_hard`



Difficulty: hard



This is a regulated exception request. The customer wants a shortcut around an export restriction, but the correct workflow is to route the case to compliance, request legal approval details, and keep the case open pending review. Sending it straight to engineering for a workaround is the wrong move.



Best action pattern:



- classify to compliance operations

- request `tenant_region`, `dpa_amendment_id`, and `legal_contact_email`
- explicitly say no temporary bypass can be granted yet
- keep the case open pending legal/compliance review

## Reward and grader design

Each task has a deterministic grader that returns a score in `(0.01, 0.99)` for submission compatibility.

The grader checks:

- queue correctness
- priority correctness
- issue type correctness
- requested fields
- reply coverage
- internal note coverage
- final status
- resolution code

The environment uses the grader score delta as the main dense reward signal. On top of that, it adds smaller process-aware bonuses and penalties so that the full trajectory matters, not just the final snapshot.

Important:

- step rewards may go slightly negative when the agent makes a clearly suboptimal or unsafe move
- final deterministic grader outputs are clamped strictly inside `(0.01, 0.99)`
- `inference.py` also clamps the final emitted submission score to `(0.01, 0.99)`

Examples:

- bonus for early correct routing on urgent tasks
- bonus for moving through the workflow in the right order
- bonus when `wait` correctly reveals a scripted customer follow-up
- penalty for premature submit
- penalty for over-escalation
- penalty for mixed or sloppy actions
- penalty when the SLA gets critically low

## Project layout

```text

.

|-- inference.py

|-- openenv.yaml

|-- pyproject.toml

|-- Dockerfile

|-- uv.lock

|-- __init__.py

|-- client.py

|-- graders.py

|-- models.py

|-- openenv_compat.py

|-- policies.py

|-- tasks.py

|-- server

|   |-- __init__.py

|   |-- app.py

|   `-- supportdesk_environment.py

|-- tests

|   `-- test_supportdesk.py

`-- examples

    `-- rl

        `-- train_q_agent.py

```

## Setup instructions

### Option 1: pip

```bash

pip install -r requirements.txt

```

### Option 2: uv

```bash

uv sync

```

## Usage instructions

Validate the repo:

```bash

python -m openenv.cli validate .

```

Start the local server:

```bash

python -m server.app

```

Or use the entrypoint:

```bash

server

```

Run the baseline:

```bash

python inference.py

```

There is also a small local RL example:

```bash

python examples/rl/train_q_agent.py

```

## Baseline and environment variables

`inference.py` uses the OpenAI Python client when model configuration is supplied externally at runtime.

Supported variables:

- `API_BASE_URL`
- `MODEL_NAME`
- `HF_TOKEN`
- `OPENAI_API_KEY`
- `MAX_STEPS`
- `TEMPERATURE`

Example:

```bash

export API_BASE_URL="https://router.huggingface.co/v1"

export MODEL_NAME="Qwen/Qwen2.5-72B-Instruct"

export HF_TOKEN="your-token-here"

python inference.py

```

Important:

- the repo does not depend on hardcoded credentials
- the expected evaluation setup is environment-variable driven
- if credentials are missing or the model call fails, the baseline falls back to a deterministic heuristic policy so the script still completes

## Docker

Build:

```bash

docker build -t supportdesk-env .

```

Run:

```bash

docker run -p 8000:8000 supportdesk-env

```

## Hugging Face Space deployment

This repo is meant to run as a Docker Space. Keep both the GitHub repository and the Hugging Face Space public for submission.

If you have the OpenEnv CLI installed, a typical deployment command is:

```bash

openenv push --repo-id your-username/HyperBrickCaseOps

```

## Validation

Local validation:

```bash

openenv validate .

```

Validation against a running environment:

```bash

openenv validate http://127.0.0.1:8000

```

Pre-submission script:

```bash

./scripts/validate-submission.sh https://your-space.hf.space .

```

## Submission checklist

- real-world environment, not a toy or game
- typed OpenEnv action, observation, and state models
- working `reset`, `step`, and `state`
- at least 3 tasks with deterministic graders
- meaningful reward over the trajectory
- root `inference.py`
- working `Dockerfile`
- `openenv.yaml` present
- README includes environment description, motivation, action space, observation space, task descriptions, setup instructions, and baseline scores

## Baseline scores

Current deterministic fallback baseline:

- `billing_refund_easy`: `0.99`
- `account_takeover_medium`: `0.99`
- `api_incident_hard`: `0.99`
- `regulated_export_exception_hard`: `0.99`
- average: `0.99`

These scores are intentionally reproducible. The fallback policy exists to show that the environment, reward shaping, and graders all work end to end. Model-backed runs can be lower, which is useful for evaluation.