Spaces:

modelbuilderhq
/

HyperBrickCaseOps

Sleeping

App Files Files Community

modelbuilderhq commited on Mar 31

Commit

d33da97

verified ·

1 Parent(s): 181758b

Upload folder using huggingface_hub

Browse files

Files changed (11) hide show

README.md +39 -6
main.py +11 -0
pyproject.toml +1 -1
supportdesk_env.egg-info/PKG-INFO +14 -0
supportdesk_env.egg-info/SOURCES.txt +19 -0
supportdesk_env.egg-info/dependency_links.txt +1 -0
supportdesk_env.egg-info/entry_points.txt +2 -0
supportdesk_env.egg-info/requires.txt +9 -0
supportdesk_env.egg-info/top_level.txt +1 -0
supportdesk_env/server/app.py +36 -0
tests/test_supportdesk.py +17 -0

README.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-title: SupportDesk OpenEnv Environment
 sdk: docker
 app_port: 8000
 tags:
@@ -9,12 +9,14 @@ tags:
 base_path: /web
 ---
-# SupportDesk OpenEnv Environment
 SupportDesk is best thought of as an enterprise operations-desk environment, not a generic support classifier.
 SupportDesk is a real-world RL environment for enterprise support operations. The agent receives a realistic inbound ticket, a small internal knowledge base, and the live case state. It must route the case, set the right priority, decide whether to request more information, draft the customer response, add an internal note, and submit the case with the correct final status.
 This environment is intentionally built around work humans actually do every day in B2B SaaS support queues. It is not a toy chat task and it is not a game. The environment includes enterprise mechanics such as SLA countdowns, business-impact context, and distracting secondary concerns, so the agent has to prioritize the primary operational issue instead of just pattern-matching keywords.
 ## Environment Description and Motivation
@@ -34,6 +36,37 @@ This makes the environment useful for both:
 - Reproducible baseline: `inference.py` runs all tasks in a fixed order and falls back to a deterministic heuristic policy if model credentials are unavailable.
 - Novel mechanics: observations expose SLA pressure, business impact, and secondary concerns, which makes the environment closer to an enterprise operations desk than a plain support classifier.
 ## Why this is more novel than a standard support benchmark
 - It is not just routing or intent classification. The agent has to combine queueing, urgency, customer communication, internal notes, and final disposition in one trajectory.
@@ -86,11 +119,11 @@ The implementation uses typed Pydantic models for action, observation, and state
 ## Task Descriptions with Expected Difficulty
-1. `billing_refund_easy` — Expected difficulty: easy
    Duplicate-charge billing ticket. The correct path is immediate billing routing, a refund confirmation, and case resolution.
-2. `account_takeover_medium` — Expected difficulty: medium
    Suspicious-login security ticket. The agent must escalate to trust and safety, request verification details, and keep the case waiting on the customer.
-3. `api_incident_hard` — Expected difficulty: hard
    Enterprise production API incident with a distracting compliance mention. The agent must escalate to platform engineering, request the right diagnostics, and open the incident instead of resolving it.
 What makes these tasks less generic than ordinary support-routing demos:
@@ -218,7 +251,7 @@ Deploy this repo as a Docker Space and keep it public for submission. The Space
 If the OpenEnv CLI is installed, deployment can be done with:
 ```bash
-openenv push --repo-id your-username/hyperbrick-support-ops-env
 ```
 ## Validation

 ---
+title: HyperBrickCaseOps
 sdk: docker
 app_port: 8000
 tags:
 base_path: /web
 ---
+# HyperBrickCaseOps
 SupportDesk is best thought of as an enterprise operations-desk environment, not a generic support classifier.
 SupportDesk is a real-world RL environment for enterprise support operations. The agent receives a realistic inbound ticket, a small internal knowledge base, and the live case state. It must route the case, set the right priority, decide whether to request more information, draft the customer response, add an internal note, and submit the case with the correct final status.
+One-sentence summary: HyperBrickCaseOps is a deterministic OpenEnv customer-support operations environment that evaluates whether an agent can triage, communicate, escalate, and resolve enterprise cases correctly end to end.
 This environment is intentionally built around work humans actually do every day in B2B SaaS support queues. It is not a toy chat task and it is not a game. The environment includes enterprise mechanics such as SLA countdowns, business-impact context, and distracting secondary concerns, so the agent has to prioritize the primary operational issue instead of just pattern-matching keywords.
 ## Environment Description and Motivation
 - Reproducible baseline: `inference.py` runs all tasks in a fixed order and falls back to a deterministic heuristic policy if model credentials are unavailable.
 - Novel mechanics: observations expose SLA pressure, business impact, and secondary concerns, which makes the environment closer to an enterprise operations desk than a plain support classifier.
+## Architecture Diagram
+```text
+Inbound Task Spec + Ticket + KB
+            |
+            v
+  SupportDeskEnvironment
+  - reset()
+  - step(action)
+  - state()
+            |
+            +--> SupportDeskObservation
+            +--> dense reward shaping
+            +--> episode termination
+            |
+            v
+     Deterministic Grader
+     - queue correctness
+     - priority correctness
+     - issue type correctness
+     - requested fields
+     - reply coverage
+     - internal note coverage
+     - status / resolution
+            |
+            v
+   Baseline in inference.py
+   - OpenAI-compatible client path
+   - deterministic fallback path
+```
 ## Why this is more novel than a standard support benchmark
 - It is not just routing or intent classification. The agent has to combine queueing, urgency, customer communication, internal notes, and final disposition in one trajectory.
 ## Task Descriptions with Expected Difficulty
+1. `billing_refund_easy` - Expected difficulty: easy
    Duplicate-charge billing ticket. The correct path is immediate billing routing, a refund confirmation, and case resolution.
+2. `account_takeover_medium` - Expected difficulty: medium
    Suspicious-login security ticket. The agent must escalate to trust and safety, request verification details, and keep the case waiting on the customer.
+3. `api_incident_hard` - Expected difficulty: hard
    Enterprise production API incident with a distracting compliance mention. The agent must escalate to platform engineering, request the right diagnostics, and open the incident instead of resolving it.
 What makes these tasks less generic than ordinary support-routing demos:
 If the OpenEnv CLI is installed, deployment can be done with:
 ```bash
+openenv push --repo-id your-username/HyperBrickCaseOps
 ```
 ## Validation

main.py ADDED Viewed

	@@ -0,0 +1,11 @@

+"""Root server entrypoint wrapper for validator-friendly packaging."""
+from __future__ import annotations
+from supportdesk_env.server.app import app, main as _run_server
+def main() -> None:
+    """Launch the local OpenEnv HTTP server."""
+    _run_server()

pyproject.toml CHANGED Viewed

@@ -19,7 +19,7 @@ dev = [
 ]
 [project.scripts]
-server = "supportdesk_env.server.app:main"
 [build-system]
 requires = ["setuptools"]

 ]
 [project.scripts]
+server = "main:main"
 [build-system]
 requires = ["setuptools"]

supportdesk_env.egg-info/PKG-INFO ADDED Viewed

	@@ -0,0 +1,14 @@

+Metadata-Version: 2.4
+Name: supportdesk-env
+Version: 0.1.0
+Summary: A real-world OpenEnv environment for customer support triage and escalation.
+Author: HyperBrick
+Requires-Python: >=3.10
+Requires-Dist: fastapi>=0.115.0
+Requires-Dist: openai>=1.54.0
+Requires-Dist: openenv-core>=0.2.0
+Requires-Dist: pydantic>=2.9.0
+Requires-Dist: requests>=2.32.0
+Requires-Dist: uvicorn>=0.30.0
+Provides-Extra: dev
+Requires-Dist: pytest>=8.3.0; extra == "dev"

supportdesk_env.egg-info/SOURCES.txt ADDED Viewed

	@@ -0,0 +1,19 @@

+README.md
+pyproject.toml
+supportdesk_env/__init__.py
+supportdesk_env/client.py
+supportdesk_env/graders.py
+supportdesk_env/models.py
+supportdesk_env/openenv_compat.py
+supportdesk_env/policies.py
+supportdesk_env/tasks.py
+supportdesk_env.egg-info/PKG-INFO
+supportdesk_env.egg-info/SOURCES.txt
+supportdesk_env.egg-info/dependency_links.txt
+supportdesk_env.egg-info/entry_points.txt
+supportdesk_env.egg-info/requires.txt
+supportdesk_env.egg-info/top_level.txt
+supportdesk_env/server/__init__.py
+supportdesk_env/server/app.py
+supportdesk_env/server/supportdesk_environment.py
+tests/test_supportdesk.py

supportdesk_env.egg-info/dependency_links.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+

supportdesk_env.egg-info/entry_points.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ [console_scripts]
2	+ server = supportdesk_env.server.app:main

supportdesk_env.egg-info/requires.txt ADDED Viewed

	@@ -0,0 +1,9 @@

+fastapi>=0.115.0
+openai>=1.54.0
+openenv-core>=0.2.0
+pydantic>=2.9.0
+requests>=2.32.0
+uvicorn>=0.30.0
+[dev]
+pytest>=8.3.0

supportdesk_env.egg-info/top_level.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+ supportdesk_env

supportdesk_env/server/app.py CHANGED Viewed

@@ -3,6 +3,7 @@
 from __future__ import annotations
 import os
 import uvicorn
@@ -13,6 +14,7 @@ except ImportError:  # pragma: no cover - package name differs across releases
 from supportdesk_env.models import SupportDeskAction, SupportDeskObservation
 from supportdesk_env.server.supportdesk_environment import SupportDeskEnvironment
 app = create_app(
     SupportDeskEnvironment,
@@ -22,6 +24,40 @@ app = create_app(
 )
 def main() -> None:
     """Run the local HTTP server."""

 from __future__ import annotations
 import os
+from typing import Any
 import uvicorn
 from supportdesk_env.models import SupportDeskAction, SupportDeskObservation
 from supportdesk_env.server.supportdesk_environment import SupportDeskEnvironment
+from supportdesk_env.tasks import TASKS
 app = create_app(
     SupportDeskEnvironment,
 )
+@app.get("/tasks")
+def list_tasks() -> dict[str, Any]:
+    """Expose a stable task catalog for UI, debugging, and pre-submit checks."""
+    return {
+        "environment": {
+            "name": "supportdesk_env",
+            "version": "0.1.0",
+            "grader_type": "deterministic",
+            "score_range": [0.0, 1.0],
+        },
+        "total_tasks": len(TASKS),
+        "tasks": [
+            {
+                "task_id": task.task_id,
+                "title": task.title,
+                "difficulty": task.difficulty,
+                "objective": task.objective,
+                "max_steps": task.max_steps,
+                "gold_issue_type": task.gold_issue_type,
+                "gold_queue": task.gold_queue,
+                "gold_priority": task.gold_priority,
+                "ticket_context": {
+                    "customer_tier": task.ticket.customer_tier,
+                    "region": task.ticket.region,
+                    "affected_users": task.ticket.affected_users,
+                    "sla_minutes_remaining": task.ticket.sla_minutes_remaining,
+                },
+            }
+            for task in TASKS.values()
+        ],
+    }
 def main() -> None:
     """Run the local HTTP server."""

tests/test_supportdesk.py CHANGED Viewed

@@ -56,3 +56,20 @@ def test_perfect_solution_grades_full_score():
     breakdown = grade_case(task, env.state.case)
     assert breakdown.total_score == 1.0

     breakdown = grade_case(task, env.state.case)
     assert breakdown.total_score == 1.0
+def test_max_steps_ends_episode():
+    env = SupportDeskEnvironment(task_id="billing_refund_easy")
+    observation = env.reset()
+    for _ in range(6):
+        observation = env.step(SupportDeskAction(operation="classify"))
+    assert observation.done is True
+    assert env.state.step_count == 6
+def test_grade_is_bounded_between_zero_and_one():
+    task = get_task("api_incident_hard")
+    env = SupportDeskEnvironment(task_id=task.task_id)
+    env.reset()
+    breakdown = grade_case(task, env.state.case)
+    assert 0.0 <= breakdown.total_score <= 1.0