Spaces:

modelbuilderhq
/

HyperBrickCaseOps

Sleeping

App Files Files Community

HyperBrickCaseOps / agents.md

modelbuilderhq

Upload folder using huggingface_hub

9ffc733 verified 30 days ago

preview code

raw

history blame contribute delete

2.79 kB

	---
	title: HyperBrickCaseOps Agent Guide
	---

	# HyperBrickCaseOps Agent Guide

	This environment evaluates real-world customer support triage. Agents must classify the ticket, request missing info when required, draft the customer reply, add an internal note, and submit only when the workflow is complete.

	## Quick Start (Agent Strategy)

	Recommended action order:

	1. `classify` — set `queue`, `priority`, `issue_type`
	2. `request_info` if `required_next_actions` includes it
	3. `wait` if the customer follow-up is pending
	4. `draft_reply`
	5. `add_internal_note`
	6. `submit`

	## Environment API

	The environment follows the standard OpenEnv API:

	- `reset()` -> initial observation
	- `step(action)` -> next observation, reward, done
	- `state()` -> internal state snapshot

	Server entrypoint:

	- `server.app:app`

	## Action Schema

	Each step takes a typed `SupportDeskAction`:

	- `operation`: `classify\|request_info\|draft_reply\|add_internal_note\|submit\|wait`
	- `queue`: string or null
	- `priority`: string or null
	- `issue_type`: string or null
	- `status`: string or null
	- `resolution_code`: string or null
	- `requested_fields`: list of strings
	- `reply`: string or null
	- `internal_note`: string or null

	## Observation Highlights

	The observation includes:

	- `task_id`, `difficulty`, `objective`
	- `ticket` (customer, tier, region, business impact)
	- `knowledge_base` (policy snippets)
	- `case` (current triage state)
	- `workflow_stage`, `required_next_actions`, `risk_flags`

	## Tasks and Difficulty

	There are 4 tasks with increasing difficulty:

	- `billing_refund_easy` (easy)
	- `account_takeover_medium` (medium)
	- `api_incident_hard` (hard)
	- `regulated_export_exception_hard` (hard)

	## Grading and Reward

	- Deterministic graders score task completion
	- Final scores are clamped to `(0.01, 0.99)`
	- Reward provides dense progress signals across the episode

	## Routing Guide (High-Level)

	- Duplicate charge -> `billing_ops`, `high`, `duplicate_charge`
	- Suspicious login -> `trust_and_safety`, `urgent`, `account_compromise`
	- Production 500s -> `platform_engineering`, `urgent`, `production_incident`
	- Export policy bypass -> `compliance_ops`, `high`, `regulated_exception`

	## Required Environment Variables

	Baseline inference uses:

	- `API_BASE_URL`
	- `MODEL_NAME`
	- `HF_TOKEN`

	## Mandatory Stdout Format

	The inference script must emit exactly:

	```
	[START] task=<task_name> env=<benchmark> model=<model_name>
	[STEP] step=<n> action=<action_str> reward=<0.00> done=<true\|false> error=<msg\|null>
	[END] success=<true\|false> steps=<n> score=<score> rewards=<r1,r2,...,rn>
	```

	Rules:

	- One `[START]` at episode begin
	- One `[STEP]` per env step
	- One `[END]` after episode close
	- `reward` and `rewards` formatted to 2 decimals
	- `done`/`success` are lowercase booleans