Spaces:
Sleeping
Sleeping
Upload folder using huggingface_hub
Browse files- README.md +39 -6
- main.py +11 -0
- pyproject.toml +1 -1
- supportdesk_env.egg-info/PKG-INFO +14 -0
- supportdesk_env.egg-info/SOURCES.txt +19 -0
- supportdesk_env.egg-info/dependency_links.txt +1 -0
- supportdesk_env.egg-info/entry_points.txt +2 -0
- supportdesk_env.egg-info/requires.txt +9 -0
- supportdesk_env.egg-info/top_level.txt +1 -0
- supportdesk_env/server/app.py +36 -0
- tests/test_supportdesk.py +17 -0
README.md
CHANGED
|
@@ -1,5 +1,5 @@
|
|
| 1 |
---
|
| 2 |
-
title:
|
| 3 |
sdk: docker
|
| 4 |
app_port: 8000
|
| 5 |
tags:
|
|
@@ -9,12 +9,14 @@ tags:
|
|
| 9 |
base_path: /web
|
| 10 |
---
|
| 11 |
|
| 12 |
-
#
|
| 13 |
|
| 14 |
SupportDesk is best thought of as an enterprise operations-desk environment, not a generic support classifier.
|
| 15 |
|
| 16 |
SupportDesk is a real-world RL environment for enterprise support operations. The agent receives a realistic inbound ticket, a small internal knowledge base, and the live case state. It must route the case, set the right priority, decide whether to request more information, draft the customer response, add an internal note, and submit the case with the correct final status.
|
| 17 |
|
|
|
|
|
|
|
| 18 |
This environment is intentionally built around work humans actually do every day in B2B SaaS support queues. It is not a toy chat task and it is not a game. The environment includes enterprise mechanics such as SLA countdowns, business-impact context, and distracting secondary concerns, so the agent has to prioritize the primary operational issue instead of just pattern-matching keywords.
|
| 19 |
|
| 20 |
## Environment Description and Motivation
|
|
@@ -34,6 +36,37 @@ This makes the environment useful for both:
|
|
| 34 |
- Reproducible baseline: `inference.py` runs all tasks in a fixed order and falls back to a deterministic heuristic policy if model credentials are unavailable.
|
| 35 |
- Novel mechanics: observations expose SLA pressure, business impact, and secondary concerns, which makes the environment closer to an enterprise operations desk than a plain support classifier.
|
| 36 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
## Why this is more novel than a standard support benchmark
|
| 38 |
|
| 39 |
- It is not just routing or intent classification. The agent has to combine queueing, urgency, customer communication, internal notes, and final disposition in one trajectory.
|
|
@@ -86,11 +119,11 @@ The implementation uses typed Pydantic models for action, observation, and state
|
|
| 86 |
|
| 87 |
## Task Descriptions with Expected Difficulty
|
| 88 |
|
| 89 |
-
1. `billing_refund_easy`
|
| 90 |
Duplicate-charge billing ticket. The correct path is immediate billing routing, a refund confirmation, and case resolution.
|
| 91 |
-
2. `account_takeover_medium`
|
| 92 |
Suspicious-login security ticket. The agent must escalate to trust and safety, request verification details, and keep the case waiting on the customer.
|
| 93 |
-
3. `api_incident_hard`
|
| 94 |
Enterprise production API incident with a distracting compliance mention. The agent must escalate to platform engineering, request the right diagnostics, and open the incident instead of resolving it.
|
| 95 |
|
| 96 |
What makes these tasks less generic than ordinary support-routing demos:
|
|
@@ -218,7 +251,7 @@ Deploy this repo as a Docker Space and keep it public for submission. The Space
|
|
| 218 |
If the OpenEnv CLI is installed, deployment can be done with:
|
| 219 |
|
| 220 |
```bash
|
| 221 |
-
openenv push --repo-id your-username/
|
| 222 |
```
|
| 223 |
|
| 224 |
## Validation
|
|
|
|
| 1 |
---
|
| 2 |
+
title: HyperBrickCaseOps
|
| 3 |
sdk: docker
|
| 4 |
app_port: 8000
|
| 5 |
tags:
|
|
|
|
| 9 |
base_path: /web
|
| 10 |
---
|
| 11 |
|
| 12 |
+
# HyperBrickCaseOps
|
| 13 |
|
| 14 |
SupportDesk is best thought of as an enterprise operations-desk environment, not a generic support classifier.
|
| 15 |
|
| 16 |
SupportDesk is a real-world RL environment for enterprise support operations. The agent receives a realistic inbound ticket, a small internal knowledge base, and the live case state. It must route the case, set the right priority, decide whether to request more information, draft the customer response, add an internal note, and submit the case with the correct final status.
|
| 17 |
|
| 18 |
+
One-sentence summary: HyperBrickCaseOps is a deterministic OpenEnv customer-support operations environment that evaluates whether an agent can triage, communicate, escalate, and resolve enterprise cases correctly end to end.
|
| 19 |
+
|
| 20 |
This environment is intentionally built around work humans actually do every day in B2B SaaS support queues. It is not a toy chat task and it is not a game. The environment includes enterprise mechanics such as SLA countdowns, business-impact context, and distracting secondary concerns, so the agent has to prioritize the primary operational issue instead of just pattern-matching keywords.
|
| 21 |
|
| 22 |
## Environment Description and Motivation
|
|
|
|
| 36 |
- Reproducible baseline: `inference.py` runs all tasks in a fixed order and falls back to a deterministic heuristic policy if model credentials are unavailable.
|
| 37 |
- Novel mechanics: observations expose SLA pressure, business impact, and secondary concerns, which makes the environment closer to an enterprise operations desk than a plain support classifier.
|
| 38 |
|
| 39 |
+
## Architecture Diagram
|
| 40 |
+
|
| 41 |
+
```text
|
| 42 |
+
Inbound Task Spec + Ticket + KB
|
| 43 |
+
|
|
| 44 |
+
v
|
| 45 |
+
SupportDeskEnvironment
|
| 46 |
+
- reset()
|
| 47 |
+
- step(action)
|
| 48 |
+
- state()
|
| 49 |
+
|
|
| 50 |
+
+--> SupportDeskObservation
|
| 51 |
+
+--> dense reward shaping
|
| 52 |
+
+--> episode termination
|
| 53 |
+
|
|
| 54 |
+
v
|
| 55 |
+
Deterministic Grader
|
| 56 |
+
- queue correctness
|
| 57 |
+
- priority correctness
|
| 58 |
+
- issue type correctness
|
| 59 |
+
- requested fields
|
| 60 |
+
- reply coverage
|
| 61 |
+
- internal note coverage
|
| 62 |
+
- status / resolution
|
| 63 |
+
|
|
| 64 |
+
v
|
| 65 |
+
Baseline in inference.py
|
| 66 |
+
- OpenAI-compatible client path
|
| 67 |
+
- deterministic fallback path
|
| 68 |
+
```
|
| 69 |
+
|
| 70 |
## Why this is more novel than a standard support benchmark
|
| 71 |
|
| 72 |
- It is not just routing or intent classification. The agent has to combine queueing, urgency, customer communication, internal notes, and final disposition in one trajectory.
|
|
|
|
| 119 |
|
| 120 |
## Task Descriptions with Expected Difficulty
|
| 121 |
|
| 122 |
+
1. `billing_refund_easy` - Expected difficulty: easy
|
| 123 |
Duplicate-charge billing ticket. The correct path is immediate billing routing, a refund confirmation, and case resolution.
|
| 124 |
+
2. `account_takeover_medium` - Expected difficulty: medium
|
| 125 |
Suspicious-login security ticket. The agent must escalate to trust and safety, request verification details, and keep the case waiting on the customer.
|
| 126 |
+
3. `api_incident_hard` - Expected difficulty: hard
|
| 127 |
Enterprise production API incident with a distracting compliance mention. The agent must escalate to platform engineering, request the right diagnostics, and open the incident instead of resolving it.
|
| 128 |
|
| 129 |
What makes these tasks less generic than ordinary support-routing demos:
|
|
|
|
| 251 |
If the OpenEnv CLI is installed, deployment can be done with:
|
| 252 |
|
| 253 |
```bash
|
| 254 |
+
openenv push --repo-id your-username/HyperBrickCaseOps
|
| 255 |
```
|
| 256 |
|
| 257 |
## Validation
|
main.py
ADDED
|
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Root server entrypoint wrapper for validator-friendly packaging."""
|
| 2 |
+
|
| 3 |
+
from __future__ import annotations
|
| 4 |
+
|
| 5 |
+
from supportdesk_env.server.app import app, main as _run_server
|
| 6 |
+
|
| 7 |
+
|
| 8 |
+
def main() -> None:
|
| 9 |
+
"""Launch the local OpenEnv HTTP server."""
|
| 10 |
+
|
| 11 |
+
_run_server()
|
pyproject.toml
CHANGED
|
@@ -19,7 +19,7 @@ dev = [
|
|
| 19 |
]
|
| 20 |
|
| 21 |
[project.scripts]
|
| 22 |
-
server = "
|
| 23 |
|
| 24 |
[build-system]
|
| 25 |
requires = ["setuptools"]
|
|
|
|
| 19 |
]
|
| 20 |
|
| 21 |
[project.scripts]
|
| 22 |
+
server = "main:main"
|
| 23 |
|
| 24 |
[build-system]
|
| 25 |
requires = ["setuptools"]
|
supportdesk_env.egg-info/PKG-INFO
ADDED
|
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
Metadata-Version: 2.4
|
| 2 |
+
Name: supportdesk-env
|
| 3 |
+
Version: 0.1.0
|
| 4 |
+
Summary: A real-world OpenEnv environment for customer support triage and escalation.
|
| 5 |
+
Author: HyperBrick
|
| 6 |
+
Requires-Python: >=3.10
|
| 7 |
+
Requires-Dist: fastapi>=0.115.0
|
| 8 |
+
Requires-Dist: openai>=1.54.0
|
| 9 |
+
Requires-Dist: openenv-core>=0.2.0
|
| 10 |
+
Requires-Dist: pydantic>=2.9.0
|
| 11 |
+
Requires-Dist: requests>=2.32.0
|
| 12 |
+
Requires-Dist: uvicorn>=0.30.0
|
| 13 |
+
Provides-Extra: dev
|
| 14 |
+
Requires-Dist: pytest>=8.3.0; extra == "dev"
|
supportdesk_env.egg-info/SOURCES.txt
ADDED
|
@@ -0,0 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
README.md
|
| 2 |
+
pyproject.toml
|
| 3 |
+
supportdesk_env/__init__.py
|
| 4 |
+
supportdesk_env/client.py
|
| 5 |
+
supportdesk_env/graders.py
|
| 6 |
+
supportdesk_env/models.py
|
| 7 |
+
supportdesk_env/openenv_compat.py
|
| 8 |
+
supportdesk_env/policies.py
|
| 9 |
+
supportdesk_env/tasks.py
|
| 10 |
+
supportdesk_env.egg-info/PKG-INFO
|
| 11 |
+
supportdesk_env.egg-info/SOURCES.txt
|
| 12 |
+
supportdesk_env.egg-info/dependency_links.txt
|
| 13 |
+
supportdesk_env.egg-info/entry_points.txt
|
| 14 |
+
supportdesk_env.egg-info/requires.txt
|
| 15 |
+
supportdesk_env.egg-info/top_level.txt
|
| 16 |
+
supportdesk_env/server/__init__.py
|
| 17 |
+
supportdesk_env/server/app.py
|
| 18 |
+
supportdesk_env/server/supportdesk_environment.py
|
| 19 |
+
tests/test_supportdesk.py
|
supportdesk_env.egg-info/dependency_links.txt
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
|
supportdesk_env.egg-info/entry_points.txt
ADDED
|
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[console_scripts]
|
| 2 |
+
server = supportdesk_env.server.app:main
|
supportdesk_env.egg-info/requires.txt
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
fastapi>=0.115.0
|
| 2 |
+
openai>=1.54.0
|
| 3 |
+
openenv-core>=0.2.0
|
| 4 |
+
pydantic>=2.9.0
|
| 5 |
+
requests>=2.32.0
|
| 6 |
+
uvicorn>=0.30.0
|
| 7 |
+
|
| 8 |
+
[dev]
|
| 9 |
+
pytest>=8.3.0
|
supportdesk_env.egg-info/top_level.txt
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
supportdesk_env
|
supportdesk_env/server/app.py
CHANGED
|
@@ -3,6 +3,7 @@
|
|
| 3 |
from __future__ import annotations
|
| 4 |
|
| 5 |
import os
|
|
|
|
| 6 |
|
| 7 |
import uvicorn
|
| 8 |
|
|
@@ -13,6 +14,7 @@ except ImportError: # pragma: no cover - package name differs across releases
|
|
| 13 |
|
| 14 |
from supportdesk_env.models import SupportDeskAction, SupportDeskObservation
|
| 15 |
from supportdesk_env.server.supportdesk_environment import SupportDeskEnvironment
|
|
|
|
| 16 |
|
| 17 |
app = create_app(
|
| 18 |
SupportDeskEnvironment,
|
|
@@ -22,6 +24,40 @@ app = create_app(
|
|
| 22 |
)
|
| 23 |
|
| 24 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
def main() -> None:
|
| 26 |
"""Run the local HTTP server."""
|
| 27 |
|
|
|
|
| 3 |
from __future__ import annotations
|
| 4 |
|
| 5 |
import os
|
| 6 |
+
from typing import Any
|
| 7 |
|
| 8 |
import uvicorn
|
| 9 |
|
|
|
|
| 14 |
|
| 15 |
from supportdesk_env.models import SupportDeskAction, SupportDeskObservation
|
| 16 |
from supportdesk_env.server.supportdesk_environment import SupportDeskEnvironment
|
| 17 |
+
from supportdesk_env.tasks import TASKS
|
| 18 |
|
| 19 |
app = create_app(
|
| 20 |
SupportDeskEnvironment,
|
|
|
|
| 24 |
)
|
| 25 |
|
| 26 |
|
| 27 |
+
@app.get("/tasks")
|
| 28 |
+
def list_tasks() -> dict[str, Any]:
|
| 29 |
+
"""Expose a stable task catalog for UI, debugging, and pre-submit checks."""
|
| 30 |
+
|
| 31 |
+
return {
|
| 32 |
+
"environment": {
|
| 33 |
+
"name": "supportdesk_env",
|
| 34 |
+
"version": "0.1.0",
|
| 35 |
+
"grader_type": "deterministic",
|
| 36 |
+
"score_range": [0.0, 1.0],
|
| 37 |
+
},
|
| 38 |
+
"total_tasks": len(TASKS),
|
| 39 |
+
"tasks": [
|
| 40 |
+
{
|
| 41 |
+
"task_id": task.task_id,
|
| 42 |
+
"title": task.title,
|
| 43 |
+
"difficulty": task.difficulty,
|
| 44 |
+
"objective": task.objective,
|
| 45 |
+
"max_steps": task.max_steps,
|
| 46 |
+
"gold_issue_type": task.gold_issue_type,
|
| 47 |
+
"gold_queue": task.gold_queue,
|
| 48 |
+
"gold_priority": task.gold_priority,
|
| 49 |
+
"ticket_context": {
|
| 50 |
+
"customer_tier": task.ticket.customer_tier,
|
| 51 |
+
"region": task.ticket.region,
|
| 52 |
+
"affected_users": task.ticket.affected_users,
|
| 53 |
+
"sla_minutes_remaining": task.ticket.sla_minutes_remaining,
|
| 54 |
+
},
|
| 55 |
+
}
|
| 56 |
+
for task in TASKS.values()
|
| 57 |
+
],
|
| 58 |
+
}
|
| 59 |
+
|
| 60 |
+
|
| 61 |
def main() -> None:
|
| 62 |
"""Run the local HTTP server."""
|
| 63 |
|
tests/test_supportdesk.py
CHANGED
|
@@ -56,3 +56,20 @@ def test_perfect_solution_grades_full_score():
|
|
| 56 |
|
| 57 |
breakdown = grade_case(task, env.state.case)
|
| 58 |
assert breakdown.total_score == 1.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 56 |
|
| 57 |
breakdown = grade_case(task, env.state.case)
|
| 58 |
assert breakdown.total_score == 1.0
|
| 59 |
+
|
| 60 |
+
|
| 61 |
+
def test_max_steps_ends_episode():
|
| 62 |
+
env = SupportDeskEnvironment(task_id="billing_refund_easy")
|
| 63 |
+
observation = env.reset()
|
| 64 |
+
for _ in range(6):
|
| 65 |
+
observation = env.step(SupportDeskAction(operation="classify"))
|
| 66 |
+
assert observation.done is True
|
| 67 |
+
assert env.state.step_count == 6
|
| 68 |
+
|
| 69 |
+
|
| 70 |
+
def test_grade_is_bounded_between_zero_and_one():
|
| 71 |
+
task = get_task("api_incident_hard")
|
| 72 |
+
env = SupportDeskEnvironment(task_id=task.task_id)
|
| 73 |
+
env.reset()
|
| 74 |
+
breakdown = grade_case(task, env.state.case)
|
| 75 |
+
assert 0.0 <= breakdown.total_score <= 1.0
|