Spaces:

modelbuilderhq
/

HyperBrickCaseOps

Sleeping

App Files Files Community

modelbuilderhq commited on 30 days ago

Commit

9ffc733

verified ·

1 Parent(s): f67a4e9

Upload folder using huggingface_hub

Browse files

Files changed (1) hide show

agents.md +102 -0

agents.md ADDED Viewed

	@@ -0,0 +1,102 @@

+---
+title: HyperBrickCaseOps Agent Guide
+---
+# HyperBrickCaseOps Agent Guide
+This environment evaluates real-world customer support triage. Agents must classify the ticket, request missing info when required, draft the customer reply, add an internal note, and submit only when the workflow is complete.
+## Quick Start (Agent Strategy)
+Recommended action order:
+1. `classify` — set `queue`, `priority`, `issue_type`
+2. `request_info` if `required_next_actions` includes it
+3. `wait` if the customer follow-up is pending
+4. `draft_reply`
+5. `add_internal_note`
+6. `submit`
+## Environment API
+The environment follows the standard OpenEnv API:
+- `reset()` -> initial observation
+- `step(action)` -> next observation, reward, done
+- `state()` -> internal state snapshot
+Server entrypoint:
+- `server.app:app`
+## Action Schema
+Each step takes a typed `SupportDeskAction`:
+- `operation`: `classify|request_info|draft_reply|add_internal_note|submit|wait`
+- `queue`: string or null
+- `priority`: string or null
+- `issue_type`: string or null
+- `status`: string or null
+- `resolution_code`: string or null
+- `requested_fields`: list of strings
+- `reply`: string or null
+- `internal_note`: string or null
+## Observation Highlights
+The observation includes:
+- `task_id`, `difficulty`, `objective`
+- `ticket` (customer, tier, region, business impact)
+- `knowledge_base` (policy snippets)
+- `case` (current triage state)
+- `workflow_stage`, `required_next_actions`, `risk_flags`
+## Tasks and Difficulty
+There are 4 tasks with increasing difficulty:
+- `billing_refund_easy` (easy)
+- `account_takeover_medium` (medium)
+- `api_incident_hard` (hard)
+- `regulated_export_exception_hard` (hard)
+## Grading and Reward
+- Deterministic graders score task completion
+- Final scores are clamped to `(0.01, 0.99)`
+- Reward provides dense progress signals across the episode
+## Routing Guide (High-Level)
+- Duplicate charge -> `billing_ops`, `high`, `duplicate_charge`
+- Suspicious login -> `trust_and_safety`, `urgent`, `account_compromise`
+- Production 500s -> `platform_engineering`, `urgent`, `production_incident`
+- Export policy bypass -> `compliance_ops`, `high`, `regulated_exception`
+## Required Environment Variables
+Baseline inference uses:
+- `API_BASE_URL`
+- `MODEL_NAME`
+- `HF_TOKEN`
+## Mandatory Stdout Format
+The inference script must emit exactly:
+```
+[START] task=<task_name> env=<benchmark> model=<model_name>
+[STEP] step=<n> action=<action_str> reward=<0.00> done=<true|false> error=<msg|null>
+[END] success=<true|false> steps=<n> score=<score> rewards=<r1,r2,...,rn>
+```
+Rules:
+- One `[START]` at episode begin
+- One `[STEP]` per env step
+- One `[END]` after episode close
+- `reward` and `rewards` formatted to 2 decimals
+- `done`/`success` are lowercase booleans