modelbuilderhq commited on
Commit
d33da97
·
verified ·
1 Parent(s): 181758b

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- title: SupportDesk OpenEnv Environment
3
  sdk: docker
4
  app_port: 8000
5
  tags:
@@ -9,12 +9,14 @@ tags:
9
  base_path: /web
10
  ---
11
 
12
- # SupportDesk OpenEnv Environment
13
 
14
  SupportDesk is best thought of as an enterprise operations-desk environment, not a generic support classifier.
15
 
16
  SupportDesk is a real-world RL environment for enterprise support operations. The agent receives a realistic inbound ticket, a small internal knowledge base, and the live case state. It must route the case, set the right priority, decide whether to request more information, draft the customer response, add an internal note, and submit the case with the correct final status.
17
 
 
 
18
  This environment is intentionally built around work humans actually do every day in B2B SaaS support queues. It is not a toy chat task and it is not a game. The environment includes enterprise mechanics such as SLA countdowns, business-impact context, and distracting secondary concerns, so the agent has to prioritize the primary operational issue instead of just pattern-matching keywords.
19
 
20
  ## Environment Description and Motivation
@@ -34,6 +36,37 @@ This makes the environment useful for both:
34
  - Reproducible baseline: `inference.py` runs all tasks in a fixed order and falls back to a deterministic heuristic policy if model credentials are unavailable.
35
  - Novel mechanics: observations expose SLA pressure, business impact, and secondary concerns, which makes the environment closer to an enterprise operations desk than a plain support classifier.
36
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
  ## Why this is more novel than a standard support benchmark
38
 
39
  - It is not just routing or intent classification. The agent has to combine queueing, urgency, customer communication, internal notes, and final disposition in one trajectory.
@@ -86,11 +119,11 @@ The implementation uses typed Pydantic models for action, observation, and state
86
 
87
  ## Task Descriptions with Expected Difficulty
88
 
89
- 1. `billing_refund_easy` Expected difficulty: easy
90
  Duplicate-charge billing ticket. The correct path is immediate billing routing, a refund confirmation, and case resolution.
91
- 2. `account_takeover_medium` Expected difficulty: medium
92
  Suspicious-login security ticket. The agent must escalate to trust and safety, request verification details, and keep the case waiting on the customer.
93
- 3. `api_incident_hard` Expected difficulty: hard
94
  Enterprise production API incident with a distracting compliance mention. The agent must escalate to platform engineering, request the right diagnostics, and open the incident instead of resolving it.
95
 
96
  What makes these tasks less generic than ordinary support-routing demos:
@@ -218,7 +251,7 @@ Deploy this repo as a Docker Space and keep it public for submission. The Space
218
  If the OpenEnv CLI is installed, deployment can be done with:
219
 
220
  ```bash
221
- openenv push --repo-id your-username/hyperbrick-support-ops-env
222
  ```
223
 
224
  ## Validation
 
1
  ---
2
+ title: HyperBrickCaseOps
3
  sdk: docker
4
  app_port: 8000
5
  tags:
 
9
  base_path: /web
10
  ---
11
 
12
+ # HyperBrickCaseOps
13
 
14
  SupportDesk is best thought of as an enterprise operations-desk environment, not a generic support classifier.
15
 
16
  SupportDesk is a real-world RL environment for enterprise support operations. The agent receives a realistic inbound ticket, a small internal knowledge base, and the live case state. It must route the case, set the right priority, decide whether to request more information, draft the customer response, add an internal note, and submit the case with the correct final status.
17
 
18
+ One-sentence summary: HyperBrickCaseOps is a deterministic OpenEnv customer-support operations environment that evaluates whether an agent can triage, communicate, escalate, and resolve enterprise cases correctly end to end.
19
+
20
  This environment is intentionally built around work humans actually do every day in B2B SaaS support queues. It is not a toy chat task and it is not a game. The environment includes enterprise mechanics such as SLA countdowns, business-impact context, and distracting secondary concerns, so the agent has to prioritize the primary operational issue instead of just pattern-matching keywords.
21
 
22
  ## Environment Description and Motivation
 
36
  - Reproducible baseline: `inference.py` runs all tasks in a fixed order and falls back to a deterministic heuristic policy if model credentials are unavailable.
37
  - Novel mechanics: observations expose SLA pressure, business impact, and secondary concerns, which makes the environment closer to an enterprise operations desk than a plain support classifier.
38
 
39
+ ## Architecture Diagram
40
+
41
+ ```text
42
+ Inbound Task Spec + Ticket + KB
43
+ |
44
+ v
45
+ SupportDeskEnvironment
46
+ - reset()
47
+ - step(action)
48
+ - state()
49
+ |
50
+ +--> SupportDeskObservation
51
+ +--> dense reward shaping
52
+ +--> episode termination
53
+ |
54
+ v
55
+ Deterministic Grader
56
+ - queue correctness
57
+ - priority correctness
58
+ - issue type correctness
59
+ - requested fields
60
+ - reply coverage
61
+ - internal note coverage
62
+ - status / resolution
63
+ |
64
+ v
65
+ Baseline in inference.py
66
+ - OpenAI-compatible client path
67
+ - deterministic fallback path
68
+ ```
69
+
70
  ## Why this is more novel than a standard support benchmark
71
 
72
  - It is not just routing or intent classification. The agent has to combine queueing, urgency, customer communication, internal notes, and final disposition in one trajectory.
 
119
 
120
  ## Task Descriptions with Expected Difficulty
121
 
122
+ 1. `billing_refund_easy` - Expected difficulty: easy
123
  Duplicate-charge billing ticket. The correct path is immediate billing routing, a refund confirmation, and case resolution.
124
+ 2. `account_takeover_medium` - Expected difficulty: medium
125
  Suspicious-login security ticket. The agent must escalate to trust and safety, request verification details, and keep the case waiting on the customer.
126
+ 3. `api_incident_hard` - Expected difficulty: hard
127
  Enterprise production API incident with a distracting compliance mention. The agent must escalate to platform engineering, request the right diagnostics, and open the incident instead of resolving it.
128
 
129
  What makes these tasks less generic than ordinary support-routing demos:
 
251
  If the OpenEnv CLI is installed, deployment can be done with:
252
 
253
  ```bash
254
+ openenv push --repo-id your-username/HyperBrickCaseOps
255
  ```
256
 
257
  ## Validation
main.py ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Root server entrypoint wrapper for validator-friendly packaging."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from supportdesk_env.server.app import app, main as _run_server
6
+
7
+
8
+ def main() -> None:
9
+ """Launch the local OpenEnv HTTP server."""
10
+
11
+ _run_server()
pyproject.toml CHANGED
@@ -19,7 +19,7 @@ dev = [
19
  ]
20
 
21
  [project.scripts]
22
- server = "supportdesk_env.server.app:main"
23
 
24
  [build-system]
25
  requires = ["setuptools"]
 
19
  ]
20
 
21
  [project.scripts]
22
+ server = "main:main"
23
 
24
  [build-system]
25
  requires = ["setuptools"]
supportdesk_env.egg-info/PKG-INFO ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Metadata-Version: 2.4
2
+ Name: supportdesk-env
3
+ Version: 0.1.0
4
+ Summary: A real-world OpenEnv environment for customer support triage and escalation.
5
+ Author: HyperBrick
6
+ Requires-Python: >=3.10
7
+ Requires-Dist: fastapi>=0.115.0
8
+ Requires-Dist: openai>=1.54.0
9
+ Requires-Dist: openenv-core>=0.2.0
10
+ Requires-Dist: pydantic>=2.9.0
11
+ Requires-Dist: requests>=2.32.0
12
+ Requires-Dist: uvicorn>=0.30.0
13
+ Provides-Extra: dev
14
+ Requires-Dist: pytest>=8.3.0; extra == "dev"
supportdesk_env.egg-info/SOURCES.txt ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ README.md
2
+ pyproject.toml
3
+ supportdesk_env/__init__.py
4
+ supportdesk_env/client.py
5
+ supportdesk_env/graders.py
6
+ supportdesk_env/models.py
7
+ supportdesk_env/openenv_compat.py
8
+ supportdesk_env/policies.py
9
+ supportdesk_env/tasks.py
10
+ supportdesk_env.egg-info/PKG-INFO
11
+ supportdesk_env.egg-info/SOURCES.txt
12
+ supportdesk_env.egg-info/dependency_links.txt
13
+ supportdesk_env.egg-info/entry_points.txt
14
+ supportdesk_env.egg-info/requires.txt
15
+ supportdesk_env.egg-info/top_level.txt
16
+ supportdesk_env/server/__init__.py
17
+ supportdesk_env/server/app.py
18
+ supportdesk_env/server/supportdesk_environment.py
19
+ tests/test_supportdesk.py
supportdesk_env.egg-info/dependency_links.txt ADDED
@@ -0,0 +1 @@
 
 
1
+
supportdesk_env.egg-info/entry_points.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ [console_scripts]
2
+ server = supportdesk_env.server.app:main
supportdesk_env.egg-info/requires.txt ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ fastapi>=0.115.0
2
+ openai>=1.54.0
3
+ openenv-core>=0.2.0
4
+ pydantic>=2.9.0
5
+ requests>=2.32.0
6
+ uvicorn>=0.30.0
7
+
8
+ [dev]
9
+ pytest>=8.3.0
supportdesk_env.egg-info/top_level.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ supportdesk_env
supportdesk_env/server/app.py CHANGED
@@ -3,6 +3,7 @@
3
  from __future__ import annotations
4
 
5
  import os
 
6
 
7
  import uvicorn
8
 
@@ -13,6 +14,7 @@ except ImportError: # pragma: no cover - package name differs across releases
13
 
14
  from supportdesk_env.models import SupportDeskAction, SupportDeskObservation
15
  from supportdesk_env.server.supportdesk_environment import SupportDeskEnvironment
 
16
 
17
  app = create_app(
18
  SupportDeskEnvironment,
@@ -22,6 +24,40 @@ app = create_app(
22
  )
23
 
24
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
  def main() -> None:
26
  """Run the local HTTP server."""
27
 
 
3
  from __future__ import annotations
4
 
5
  import os
6
+ from typing import Any
7
 
8
  import uvicorn
9
 
 
14
 
15
  from supportdesk_env.models import SupportDeskAction, SupportDeskObservation
16
  from supportdesk_env.server.supportdesk_environment import SupportDeskEnvironment
17
+ from supportdesk_env.tasks import TASKS
18
 
19
  app = create_app(
20
  SupportDeskEnvironment,
 
24
  )
25
 
26
 
27
+ @app.get("/tasks")
28
+ def list_tasks() -> dict[str, Any]:
29
+ """Expose a stable task catalog for UI, debugging, and pre-submit checks."""
30
+
31
+ return {
32
+ "environment": {
33
+ "name": "supportdesk_env",
34
+ "version": "0.1.0",
35
+ "grader_type": "deterministic",
36
+ "score_range": [0.0, 1.0],
37
+ },
38
+ "total_tasks": len(TASKS),
39
+ "tasks": [
40
+ {
41
+ "task_id": task.task_id,
42
+ "title": task.title,
43
+ "difficulty": task.difficulty,
44
+ "objective": task.objective,
45
+ "max_steps": task.max_steps,
46
+ "gold_issue_type": task.gold_issue_type,
47
+ "gold_queue": task.gold_queue,
48
+ "gold_priority": task.gold_priority,
49
+ "ticket_context": {
50
+ "customer_tier": task.ticket.customer_tier,
51
+ "region": task.ticket.region,
52
+ "affected_users": task.ticket.affected_users,
53
+ "sla_minutes_remaining": task.ticket.sla_minutes_remaining,
54
+ },
55
+ }
56
+ for task in TASKS.values()
57
+ ],
58
+ }
59
+
60
+
61
  def main() -> None:
62
  """Run the local HTTP server."""
63
 
tests/test_supportdesk.py CHANGED
@@ -56,3 +56,20 @@ def test_perfect_solution_grades_full_score():
56
 
57
  breakdown = grade_case(task, env.state.case)
58
  assert breakdown.total_score == 1.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
 
57
  breakdown = grade_case(task, env.state.case)
58
  assert breakdown.total_score == 1.0
59
+
60
+
61
+ def test_max_steps_ends_episode():
62
+ env = SupportDeskEnvironment(task_id="billing_refund_easy")
63
+ observation = env.reset()
64
+ for _ in range(6):
65
+ observation = env.step(SupportDeskAction(operation="classify"))
66
+ assert observation.done is True
67
+ assert env.state.step_count == 6
68
+
69
+
70
+ def test_grade_is_bounded_between_zero_and_one():
71
+ task = get_task("api_incident_hard")
72
+ env = SupportDeskEnvironment(task_id=task.task_id)
73
+ env.reset()
74
+ breakdown = grade_case(task, env.state.case)
75
+ assert 0.0 <= breakdown.total_score <= 1.0