modelbuilderhq commited on
Commit
9ffc733
·
verified ·
1 Parent(s): f67a4e9

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. agents.md +102 -0
agents.md ADDED
@@ -0,0 +1,102 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: HyperBrickCaseOps Agent Guide
3
+ ---
4
+
5
+ # HyperBrickCaseOps Agent Guide
6
+
7
+ This environment evaluates real-world customer support triage. Agents must classify the ticket, request missing info when required, draft the customer reply, add an internal note, and submit only when the workflow is complete.
8
+
9
+ ## Quick Start (Agent Strategy)
10
+
11
+ Recommended action order:
12
+
13
+ 1. `classify` — set `queue`, `priority`, `issue_type`
14
+ 2. `request_info` if `required_next_actions` includes it
15
+ 3. `wait` if the customer follow-up is pending
16
+ 4. `draft_reply`
17
+ 5. `add_internal_note`
18
+ 6. `submit`
19
+
20
+ ## Environment API
21
+
22
+ The environment follows the standard OpenEnv API:
23
+
24
+ - `reset()` -> initial observation
25
+ - `step(action)` -> next observation, reward, done
26
+ - `state()` -> internal state snapshot
27
+
28
+ Server entrypoint:
29
+
30
+ - `server.app:app`
31
+
32
+ ## Action Schema
33
+
34
+ Each step takes a typed `SupportDeskAction`:
35
+
36
+ - `operation`: `classify|request_info|draft_reply|add_internal_note|submit|wait`
37
+ - `queue`: string or null
38
+ - `priority`: string or null
39
+ - `issue_type`: string or null
40
+ - `status`: string or null
41
+ - `resolution_code`: string or null
42
+ - `requested_fields`: list of strings
43
+ - `reply`: string or null
44
+ - `internal_note`: string or null
45
+
46
+ ## Observation Highlights
47
+
48
+ The observation includes:
49
+
50
+ - `task_id`, `difficulty`, `objective`
51
+ - `ticket` (customer, tier, region, business impact)
52
+ - `knowledge_base` (policy snippets)
53
+ - `case` (current triage state)
54
+ - `workflow_stage`, `required_next_actions`, `risk_flags`
55
+
56
+ ## Tasks and Difficulty
57
+
58
+ There are 4 tasks with increasing difficulty:
59
+
60
+ - `billing_refund_easy` (easy)
61
+ - `account_takeover_medium` (medium)
62
+ - `api_incident_hard` (hard)
63
+ - `regulated_export_exception_hard` (hard)
64
+
65
+ ## Grading and Reward
66
+
67
+ - Deterministic graders score task completion
68
+ - Final scores are clamped to `(0.01, 0.99)`
69
+ - Reward provides dense progress signals across the episode
70
+
71
+ ## Routing Guide (High-Level)
72
+
73
+ - Duplicate charge -> `billing_ops`, `high`, `duplicate_charge`
74
+ - Suspicious login -> `trust_and_safety`, `urgent`, `account_compromise`
75
+ - Production 500s -> `platform_engineering`, `urgent`, `production_incident`
76
+ - Export policy bypass -> `compliance_ops`, `high`, `regulated_exception`
77
+
78
+ ## Required Environment Variables
79
+
80
+ Baseline inference uses:
81
+
82
+ - `API_BASE_URL`
83
+ - `MODEL_NAME`
84
+ - `HF_TOKEN`
85
+
86
+ ## Mandatory Stdout Format
87
+
88
+ The inference script must emit exactly:
89
+
90
+ ```
91
+ [START] task=<task_name> env=<benchmark> model=<model_name>
92
+ [STEP] step=<n> action=<action_str> reward=<0.00> done=<true|false> error=<msg|null>
93
+ [END] success=<true|false> steps=<n> score=<score> rewards=<r1,r2,...,rn>
94
+ ```
95
+
96
+ Rules:
97
+
98
+ - One `[START]` at episode begin
99
+ - One `[STEP]` per env step
100
+ - One `[END]` after episode close
101
+ - `reward` and `rewards` formatted to 2 decimals
102
+ - `done`/`success` are lowercase booleans