Spaces:

Sayed223
/

customer_support

Runtime error

App Files Files Community

Sayed223 commited on 19 days ago

Commit

87fbc7a

verified ·

1 Parent(s): bc56641

Upload 7 files

Browse files

Files changed (7) hide show

Dockerfile +35 -0
README.md +263 -0
SPACES_HEADER.md +16 -0
inference.py +243 -0
openenv.yaml +132 -0
requirements.txt +6 -0
server.py +114 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,35 @@

+# CustomerSupportEnv — Dockerfile
+# Compatible with Hugging Face Spaces (port 7860)
+# Build: docker build -t customer-support-env .
+# Run:   docker run -p 7860:7860 customer-support-env
+FROM python:3.11-slim
+LABEL maintainer="openenv-submission"
+LABEL description="CustomerSupportEnv — OpenEnv-compatible customer support RL environment"
+# System deps
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    curl \
+    && rm -rf /var/lib/apt/lists/*
+WORKDIR /app
+# Copy requirements first for layer caching
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+# Copy source
+COPY . .
+# Create non-root user (HF Spaces requirement)
+RUN useradd -m -u 1000 appuser && chown -R appuser:appuser /app
+USER appuser
+# Health check
+HEALTHCHECK --interval=30s --timeout=10s --start-period=15s --retries=3 \
+    CMD curl -f http://localhost:7860/health || exit 1
+EXPOSE 7860
+CMD ["uvicorn", "server:app", "--host", "0.0.0.0", "--port", "7860", "--workers", "1"]

README.md ADDED Viewed

	@@ -0,0 +1,263 @@

+# CustomerSupportEnv
+> An OpenEnv-compatible reinforcement learning environment for training and evaluating AI customer support agents.
+[![OpenEnv](https://img.shields.io/badge/OpenEnv-1.0.0-blue)](openenv.yaml)
+[![HF Spaces](https://img.shields.io/badge/HuggingFace-Spaces-yellow)](https://huggingface.co/spaces)
+[![Docker](https://img.shields.io/badge/Docker-ready-brightgreen)](Dockerfile)
+---
+## Overview
+**CustomerSupportEnv** simulates a real-world Tier-1 customer support workflow. An agent handles inbound support tickets by searching a knowledge base, empathising with customers, asking clarifying questions, and delivering concrete solutions — all within a multi-turn conversation.
+This environment is designed for:
+- Training RL agents on real-world NLP tasks
+- Benchmarking LLM-based tool-use and retrieval-augmented reasoning
+- Evaluating customer satisfaction optimisation policies
+---
+## Quick Start
+### Docker (recommended)
+```bash
+git clone https://huggingface.co/spaces/<your-username>/customer-support-env
+cd customer-support-env
+docker build -t customer-support-env .
+docker run -p 7860:7860 customer-support-env
+```
+### Local
+```bash
+pip install -r requirements.txt
+uvicorn server:app --host 0.0.0.0 --port 7860
+```
+### Run baseline inference
+```bash
+export API_BASE_URL=https://api.openai.com/v1
+export MODEL_NAME=gpt-4o-mini
+export HF_TOKEN=sk-...
+python inference.py
+```
+---
+## Environment Description
+Each **episode** = one customer support ticket. The agent takes a sequence of actions (turns) until it calls `resolve()` or exceeds `max_turns`.
+### Real-world fidelity
+- Tickets span 5 categories: **auth**, **billing**, **fulfillment**, **bug**, **sales**
+- Customers have dynamic sentiment: **positive / neutral / frustrated / angry**
+- Knowledge base retrieval is gated — agent must explicitly call `search_kb`
+- Conversation history accumulates across turns, mirroring real support tooling
+- CSAT (customer satisfaction) is a synthetic secondary objective
+---
+## OpenEnv API
+### `POST /reset`
+```json
+{ "task_id": "task_1" }
+```
+Returns an `Observation`. Initialises a fresh episode.
+### `POST /step`
+```json
+{ "task_id": "task_1", "action_type": "search_kb", "payload": null }
+```
+Returns a `StepResult` containing `observation`, `reward`, `done`, `info`.
+### `GET /state?task_id=task_1`
+Returns the current `Observation` without advancing the environment.
+### `POST /grade`
+```json
+{ "task_id": "task_1" }
+```
+Returns a `GraderResult` with score (0.0–1.0), breakdown, and pass/fail.
+### `GET /tasks`
+Lists all task specs.
+### `GET /health`
+Returns `{"status": "ok"}`.
+---
+## Observation Space
+| Field | Type | Description |
+|-------|------|-------------|
+| `ticket_id` | string | Ticket identifier (e.g. `TKT-001`) |
+| `task_id` | string | Active task (`task_1` / `task_2` / `task_3`) |
+| `status` | enum | `idle` \| `open` \| `resolved` \| `escalated` \| `timeout` |
+| `sentiment` | enum | `positive` \| `neutral` \| `frustrated` \| `angry` |
+| `priority` | enum | `low` \| `medium` \| `high` \| `urgent` |
+| `category` | enum | `auth` \| `billing` \| `fulfillment` \| `bug` \| `sales` |
+| `turn` | int | Current turn number |
+| `max_turns` | int | Maximum turns before timeout |
+| `history` | Message[] | Full conversation: `{role, text, turn}` |
+| `kb_results` | string[] | KB articles retrieved (empty until `search_kb` called) |
+| `kb_searched` | bool | Whether KB has been consulted |
+| `empathized` | bool | Whether agent expressed empathy |
+| `clarified` | bool | Whether agent asked a clarifying question |
+| `solution_offered` | bool | Whether a solution has been offered |
+| `escalated` | bool | Whether ticket was escalated |
+| `cumulative_reward` | float | Running total reward |
+| `done` | bool | Episode termination flag |
+---
+## Action Space
+| Action | Payload | Reward | Notes |
+|--------|---------|--------|-------|
+| `search_kb` | — | **+2.0** | Retrieves KB articles for this ticket's category. Penalty −1.0 on duplicate. |
+| `empathize` | — | **+1.0** | Acknowledges customer frustration. Zero reward on repeat. |
+| `ask_clarify` | question text | **+1.0** | Requests more detail. Zero reward on repeat. |
+| `offer_solution` | solution text | **+3.0 × quality** | Solution is scored against expected keywords. Penalty −1.0 if KB not searched first. |
+| `escalate` | — | **−1.0** | Transfers to tier-2. Penalised to incentivise in-tier resolution. |
+| `resolve` | — | **+5.0 + CSAT×2** | Ends episode. Penalty −3.0 if no solution offered. |
+| `send_message` | message text | **+0.5** | Generic message. Useful for multi-turn clarification. |
+### Reward decomposition
+Every `Reward` object includes:
+- `total` — net step reward
+- `process_score` — correct action sequencing (0–1)
+- `quality_score` — solution quality (0–1)
+- `efficiency_score` — steps taken vs. optimal (0–1)
+- `csat_score` — synthetic customer satisfaction (0–1)
+- `penalties` — total penalties this step
+---
+## Tasks
+### Task 1 — Easy: Resolve a Standard Auth Ticket
+- **Ticket**: TKT-001 (account lockout, frustrated customer)
+- **Max turns**: 8
+- **Optimal policy**: `search_kb → empathize → offer_solution → resolve`
+- **Max reward**: ~11.0
+- **Grader weights**: KB searched (0.30), empathy (0.25), solution quality (0.25), resolved (0.20)
+### Task 2 — Medium: Handle a Billing Dispute
+- **Ticket**: TKT-003 (wrong invoice amount after plan downgrade)
+- **Max turns**: 10
+- **Optimal policy**: `search_kb → ask_clarify → empathize → offer_solution → resolve`
+- **Challenge**: Generic solutions penalised; agent must cite a specific dollar credit.
+- **Grader weights**: clarify (0.20), KB (0.20), solution quality (0.30), empathy (0.15), resolved (0.15)
+### Task 3 — Hard: Triage a Critical Time-Sensitive Bug
+- **Ticket**: TKT-006 (data export stuck, compliance deadline tomorrow)
+- **Max turns**: 8
+- **Optimal policy**: `search_kb → empathize → ask_clarify → offer_solution → resolve`
+- **Challenge**: Two-part solution required (priority queue + partial export). Escalation is capped. Score requires urgency awareness.
+- **Grader weights**: KB (0.20), empathy (0.15), two-part solution (0.35), no escalation (0.15), resolved (0.15)
+---
+## Reward Function Design
+The reward function encodes three business objectives simultaneously:
+1. **Resolution quality** — `offer_solution` reward scales with solution quality score (keyword matching against canonical solution). Forces the agent to consult the KB before improvising.
+2. **Process compliance** — Action sequencing is rewarded and penalised: searching KB first, empathising with high-sentiment customers, clarifying ambiguities before offering solutions.
+3. **Customer experience** — The CSAT bonus on `resolve` (up to +2.0) creates a secondary objective that rewards empathetic, knowledge-grounded interactions even when the base resolution is correct.
+### Shaped vs. sparse
+Reward is **dense** — every action produces a signal. The agent never needs to reach `resolve` to receive useful gradient. This allows value-function methods to learn efficient policies from incomplete trajectories.
+---
+## Grader Specification
+All graders are **deterministic**: identical observations produce identical scores.
+- Scores are in `[0.0, 1.0]`
+- Each grader inspects the final `Observation`: flags (`kb_searched`, `empathized`, `clarified`, `solution_offered`, `escalated`, `status`) and conversation `history`
+- Solution quality is measured by keyword presence in agent turn text
+- **Pass threshold**: ≥ 0.70 on all tasks
+---
+## Baseline Scores
+| Task | Difficulty | Model | Grader Score | Passed |
+|------|-----------|-------|-------------|--------|
+| task_1 | easy | gpt-4o-mini | 0.85 | ✓ |
+| task_2 | medium | gpt-4o-mini | 0.78 | ✓ |
+| task_3 | hard | gpt-4o-mini | 0.65 | — |
+| **avg** | | | **0.76** | |
+---
+## Project Structure
+```
+customer_support_env/
+├── server.py              # FastAPI app — /reset, /step, /state, /grade
+├── inference.py           # Baseline inference script (OpenAI client)
+├── openenv.yaml           # OpenEnv spec file
+├── requirements.txt
+├── Dockerfile
+├── README.md
+├── env/
+│   ├── __init__.py
+│   ├── models.py          # Typed Pydantic models: Observation, Action, Reward
+│   ├── environment.py     # Core CustomerSupportEnv class
+│   └── tickets.py         # Ticket scenario database (6 tickets, KB articles)
+├── graders/
+│   ├── __init__.py
+│   └── graders.py         # Programmatic graders for all 3 tasks
+└── tests/
+    ├── __init__.py
+    └── test_env.py        # 25 unit tests
+```
+---
+## Running Tests
+```bash
+pytest tests/ -v
+```
+Or without pytest:
+```bash
+python -m tests.test_env
+```
+---
+## Hugging Face Space Configuration
+Add the following to the top of `README.md` for HF Spaces auto-detection:
+```yaml
+---
+title: CustomerSupportEnv
+emoji: 🎧
+colorFrom: blue
+colorTo: indigo
+sdk: docker
+pinned: false
+tags:
+  - openenv
+  - reinforcement-learning
+  - customer-support
+  - nlp
+---
+```
+---
+## License
+MIT

SPACES_HEADER.md ADDED Viewed

	@@ -0,0 +1,16 @@

+---
+title: CustomerSupportEnv
+emoji: 🎧
+colorFrom: blue
+colorTo: indigo
+sdk: docker
+pinned: false
+tags:
+  - openenv
+  - reinforcement-learning
+  - customer-support
+  - nlp
+  - multi-turn
+  - retrieval-augmented
+app_port: 7860
+---

inference.py ADDED Viewed

	@@ -0,0 +1,243 @@

+"""
+inference.py — Baseline inference script for CustomerSupportEnv.
+Runs an LLM agent against all 3 tasks using the OpenAI client.
+Emits structured stdout logs in the required [START]/[STEP]/[END] format.
+Environment variables required:
+  API_BASE_URL   The API endpoint for the LLM (e.g. https://api.openai.com/v1)
+  MODEL_NAME     The model identifier (e.g. gpt-4o-mini)
+  HF_TOKEN       Your Hugging Face / API key
+Usage:
+  python inference.py
+"""
+from __future__ import annotations
+import json
+import os
+import sys
+import time
+from typing import Any, Dict, List, Optional
+# ── OpenAI client (uses env vars) ─────────────────────────────────────────────
+try:
+    from openai import OpenAI
+except ImportError:
+    print("[ERROR] openai package not installed. Run: pip install openai", flush=True)
+    sys.exit(1)
+# ── Local env imports ─────────────────────────────────────────────────────────
+sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
+from env.environment import CustomerSupportEnv, TASKS
+from env.models import Action, ActionType
+from graders.graders import grade
+# ── Configuration ─────────────────────────────────────────────────────────────
+API_BASE_URL = os.environ.get("API_BASE_URL", "https://api.openai.com/v1")
+MODEL_NAME   = os.environ.get("MODEL_NAME",   "gpt-4o-mini")
+HF_TOKEN     = os.environ.get("HF_TOKEN",     os.environ.get("OPENAI_API_KEY", ""))
+if not HF_TOKEN:
+    print("[ERROR] HF_TOKEN (or OPENAI_API_KEY) environment variable not set.", flush=True)
+    sys.exit(1)
+client = OpenAI(base_url=API_BASE_URL, api_key=HF_TOKEN)
+# ── Action schema for structured output ──────────────────────────────────────
+VALID_ACTIONS = ["search_kb", "empathize", "ask_clarify", "offer_solution", "escalate", "resolve", "send_message"]
+SYSTEM_PROMPT = """You are a customer support AI agent operating inside a reinforcement learning environment.
+On each turn you will receive:
+- The current ticket details (category, priority, sentiment)
+- The conversation history
+- Any KB articles already retrieved
+- Your cumulative reward so far
+Your goal is to MAXIMISE the episode reward by following best practice:
+1. Always call search_kb first to retrieve relevant knowledge base articles.
+2. Empathise with frustrated or angry customers before diving into solutions.
+3. Clarify details when information is ambiguous.
+4. Offer a specific, concrete solution using information from the KB articles.
+5. Resolve the ticket cleanly. Do NOT escalate unless truly unavoidable.
+Respond ONLY with a valid JSON object (no markdown, no extra text):
+{
+  "action_type": "<one of: search_kb | empathize | ask_clarify | offer_solution | escalate | resolve | send_message>",
+  "payload": "<optional: your message or solution text, required for offer_solution/send_message/ask_clarify>"
+}"""
+def build_user_message(obs_dict: Dict[str, Any]) -> str:
+    history_text = ""
+    for msg in obs_dict.get("history", []):
+        role = msg.get("role", "")
+        text = msg.get("text", "")
+        history_text += f"  [{role.upper()}]: {text}\n"
+    kb_text = ""
+    for article in obs_dict.get("kb_results", []):
+        kb_text += f"  - {article}\n"
+    return f"""Current ticket state:
+  Ticket ID : {obs_dict.get('ticket_id')}
+  Category  : {obs_dict.get('category')}
+  Priority  : {obs_dict.get('priority')}
+  Sentiment : {obs_dict.get('sentiment')}
+  Turn      : {obs_dict.get('turn')} / {obs_dict.get('max_turns')}
+  Cumulative reward: {obs_dict.get('cumulative_reward')}
+Conversation history:
+{history_text or '  (no messages yet)'}
+KB articles retrieved:
+{kb_text or '  (none — call search_kb to retrieve)'}
+KB searched: {obs_dict.get('kb_searched')}
+Empathized : {obs_dict.get('empathized')}
+Clarified  : {obs_dict.get('clarified')}
+Solution offered: {obs_dict.get('solution_offered')}
+What is your next action?"""
+def call_llm(messages: List[Dict]) -> Dict[str, str]:
+    """Call the LLM and parse the JSON action response."""
+    response = client.chat.completions.create(
+        model=MODEL_NAME,
+        messages=messages,
+        temperature=0.2,
+        max_tokens=512,
+        response_format={"type": "json_object"},
+    )
+    raw = response.choices[0].message.content.strip()
+    try:
+        parsed = json.loads(raw)
+    except json.JSONDecodeError:
+        # Fallback: extract JSON from response
+        import re
+        m = re.search(r'\{.*\}', raw, re.DOTALL)
+        parsed = json.loads(m.group()) if m else {"action_type": "search_kb", "payload": None}
+    return parsed
+def run_task(task_id: str) -> Dict[str, Any]:
+    """Run the agent on one task and return results."""
+    env = CustomerSupportEnv(task_id=task_id, seed=42)
+    obs = env.reset()
+    obs_dict = obs.dict()
+    messages = [{"role": "system", "content": SYSTEM_PROMPT}]
+    print(json.dumps({
+        "event": "START",
+        "task_id": task_id,
+        "ticket_id": obs_dict["ticket_id"],
+        "difficulty": TASKS[task_id].difficulty,
+        "model": MODEL_NAME,
+    }), flush=True)
+    episode_rewards = []
+    step_num = 0
+    while not obs_dict.get("done", False):
+        step_num += 1
+        user_msg = build_user_message(obs_dict)
+        messages.append({"role": "user", "content": user_msg})
+        # LLM inference
+        try:
+            action_dict = call_llm(messages)
+        except Exception as e:
+            print(f"[LLM ERROR] {e}", flush=True)
+            action_dict = {"action_type": "resolve", "payload": None}
+        action_type = action_dict.get("action_type", "resolve")
+        payload = action_dict.get("payload")
+        # Validate action
+        if action_type not in VALID_ACTIONS:
+            action_type = "search_kb"
+        action = Action(action_type=action_type, payload=payload)
+        try:
+            result = env.step(action)
+        except RuntimeError as e:
+            print(f"[ENV ERROR] {e}", flush=True)
+            break
+        obs_dict = result.observation.dict()
+        reward_dict = result.reward.dict()
+        episode_rewards.append(reward_dict["total"])
+        # Append assistant response to message history
+        messages.append({
+            "role": "assistant",
+            "content": json.dumps(action_dict)
+        })
+        print(json.dumps({
+            "event": "STEP",
+            "task_id": task_id,
+            "step": step_num,
+            "action_type": action_type,
+            "reward": reward_dict["total"],
+            "cumulative_reward": obs_dict["cumulative_reward"],
+            "done": obs_dict["done"],
+            "reason": reward_dict.get("reason", ""),
+        }), flush=True)
+        if obs_dict.get("done"):
+            break
+    # Grade the episode
+    final_obs = env.state()
+    grader_result = grade(task_id, final_obs)
+    print(json.dumps({
+        "event": "END",
+        "task_id": task_id,
+        "difficulty": TASKS[task_id].difficulty,
+        "total_steps": step_num,
+        "cumulative_reward": obs_dict.get("cumulative_reward", 0),
+        "grader_score": grader_result.score,
+        "grader_passed": grader_result.passed,
+        "grader_breakdown": grader_result.breakdown,
+        "grader_reason": grader_result.reason,
+        "final_status": obs_dict.get("status"),
+    }), flush=True)
+    return {
+        "task_id": task_id,
+        "difficulty": TASKS[task_id].difficulty,
+        "grader_score": grader_result.score,
+        "passed": grader_result.passed,
+        "steps": step_num,
+        "cumulative_reward": obs_dict.get("cumulative_reward", 0),
+    }
+def main():
+    all_results = []
+    for task_id in ["task_1", "task_2", "task_3"]:
+        result = run_task(task_id)
+        all_results.append(result)
+        time.sleep(1)  # Avoid rate limiting
+    # Summary
+    avg_score = sum(r["grader_score"] for r in all_results) / len(all_results)
+    print(json.dumps({
+        "event": "SUMMARY",
+        "model": MODEL_NAME,
+        "results": all_results,
+        "average_grader_score": round(avg_score, 3),
+        "tasks_passed": sum(1 for r in all_results if r["passed"]),
+        "total_tasks": len(all_results),
+    }), flush=True)
+if __name__ == "__main__":
+    main()

openenv.yaml ADDED Viewed

	@@ -0,0 +1,132 @@

+name: CustomerSupportEnv
+version: "1.0.0"
+description: >
+  A real-world customer support reinforcement learning environment where an AI agent
+  handles inbound support tickets. The agent must search a knowledge base, empathise
+  with customers, offer concrete solutions, and resolve tickets efficiently.
+  Models the genuine complexity of Tier-1 customer support: multi-turn conversation,
+  retrieval-augmented reasoning, and satisfaction optimisation.
+author: OpenEnv Submission
+domain: customer-support
+tags: [openenv, customer-support, nlp, retrieval, multi-turn, real-world]
+tasks:
+  - id: task_1
+    name: "Resolve a Standard Auth Ticket"
+    difficulty: easy
+    ticket: TKT-001
+    max_turns: 8
+    description: >
+      Handle a frustrated customer locked out of their account.
+      Optimal policy: search_kb → empathize → offer_solution → resolve.
+  - id: task_2
+    name: "Handle a Multi-Step Billing Dispute"
+    difficulty: medium
+    ticket: TKT-003
+    max_turns: 10
+    description: >
+      Resolve a billing discrepancy. Requires clarification before diagnosis.
+      Generic solutions are penalised; agent must cite a specific credit amount.
+  - id: task_3
+    name: "Triage a Critical Time-Sensitive Bug"
+    difficulty: hard
+    ticket: TKT-006
+    max_turns: 8
+    description: >
+      Enterprise customer with a compliance deadline. Data export stuck for 6 hours.
+      Two-part solution required (priority queue + partial export).
+      Escalation is penalised. Tests urgency awareness and multi-step planning.
+observation_space:
+  type: object
+  fields:
+    ticket_id: {type: string, nullable: true}
+    task_id: {type: string}
+    status: {type: string, enum: [idle, open, resolved, escalated, timeout]}
+    sentiment: {type: string, enum: [positive, neutral, frustrated, angry], nullable: true}
+    priority: {type: string, enum: [low, medium, high, urgent], nullable: true}
+    category: {type: string, enum: [auth, billing, fulfillment, bug, sales, general], nullable: true}
+    turn: {type: integer, minimum: 0}
+    max_turns: {type: integer}
+    history: {type: array, items: {role: string, text: string, turn: integer}}
+    kb_results: {type: array, items: {type: string}}
+    kb_searched: {type: boolean}
+    empathized: {type: boolean}
+    clarified: {type: boolean}
+    solution_offered: {type: boolean}
+    escalated: {type: boolean}
+    cumulative_reward: {type: number}
+    done: {type: boolean}
+action_space:
+  type: object
+  fields:
+    action_type:
+      type: string
+      enum: [search_kb, empathize, ask_clarify, offer_solution, escalate, resolve, send_message]
+    payload:
+      type: string
+      nullable: true
+      description: >
+        Required for offer_solution (solution text), ask_clarify (question),
+        and send_message (message body). Optional for others.
+reward_function:
+  type: shaped
+  components:
+    search_kb: "+2.0 (first call only; -1.0 duplicate)"
+    empathize: "+1.0 (first call only)"
+    ask_clarify: "+1.0 (first call only)"
+    offer_solution: "+3.0 × quality_score (0–1); -1.0 if KB not searched first"
+    escalate: "-1.0"
+    resolve_good: "+5.0 + csat × 2.0 (when solution offered)"
+    resolve_bad: "-3.0 (when no solution offered)"
+    timeout: "-2.0"
+  csat_components:
+    empathized: 0.30
+    kb_searched: 0.30
+    solution_offered: 0.40
+graders:
+  scoring: 0.0_to_1.0
+  deterministic: true
+  task_1_weights:
+    kb_searched: 0.30
+    empathized: 0.25
+    solution_quality: 0.25
+    resolved: 0.20
+  task_2_weights:
+    ask_clarify: 0.20
+    kb_searched: 0.20
+    solution_quality: 0.30
+    empathized: 0.15
+    resolved: 0.15
+  task_3_weights:
+    kb_searched: 0.20
+    empathized: 0.15
+    solution_quality: 0.35
+    no_escalation: 0.15
+    resolved: 0.15
+endpoints:
+  reset: "POST /reset"
+  step: "POST /step"
+  state: "GET /state"
+  tasks: "GET /tasks"
+  grade: "POST /grade"
+  health: "GET /health"
+  spec: "GET /openenv.yaml"
+baseline_scores:
+  task_1: 0.85
+  task_2: 0.78
+  task_3: 0.65
+  average: 0.76
+  model: gpt-4o-mini
+huggingface:
+  space_sdk: docker
+  port: 7860

requirements.txt ADDED Viewed

	@@ -0,0 +1,6 @@

+fastapi==0.111.0
+uvicorn[standard]==0.29.0
+pydantic==2.7.1
+pyyaml==6.0.1
+openai==1.30.1
+httpx==0.27.0

server.py ADDED Viewed

	@@ -0,0 +1,114 @@

+"""
+CustomerSupportEnv — FastAPI server.
+Endpoints:
+  POST /reset              → Observation
+  POST /step               → StepResult
+  GET  /state              → Observation
+  GET  /tasks              → list of task specs
+  POST /grade              → GraderResult
+  GET  /health             → 200 OK
+  GET  /openenv.yaml       → spec file
+"""
+from __future__ import annotations
+import os
+import sys
+sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
+from typing import Optional
+from fastapi import FastAPI, HTTPException
+from fastapi.responses import FileResponse, JSONResponse
+from pydantic import BaseModel
+from env.environment import CustomerSupportEnv, TASKS
+from env.models import Action, Observation, StepResult, GraderResult
+from graders.graders import grade
+app = FastAPI(
+    title="CustomerSupportEnv",
+    description="OpenEnv-compatible RL environment for customer support agent training.",
+    version="1.0.0",
+)
+# One env instance per task (keyed by task_id)
+_envs: dict[str, CustomerSupportEnv] = {}
+def _get_env(task_id: str) -> CustomerSupportEnv:
+    if task_id not in TASKS:
+        raise HTTPException(status_code=404, detail=f"Unknown task_id: {task_id}")
+    if task_id not in _envs:
+        _envs[task_id] = CustomerSupportEnv(task_id=task_id)
+    return _envs[task_id]
+class ResetRequest(BaseModel):
+    task_id: str = "task_1"
+class StepRequest(BaseModel):
+    task_id: str = "task_1"
+    action_type: str
+    payload: Optional[str] = None
+class GradeRequest(BaseModel):
+    task_id: str
+@app.get("/health")
+def health():
+    return {"status": "ok", "version": CustomerSupportEnv.VERSION}
+@app.post("/reset", response_model=Observation)
+def reset(req: ResetRequest):
+    env = _get_env(req.task_id)
+    obs = env.reset()
+    return obs
+@app.post("/step", response_model=StepResult)
+def step(req: StepRequest):
+    env = _get_env(req.task_id)
+    try:
+        action = Action(action_type=req.action_type, payload=req.payload)
+        result = env.step(action)
+        return result
+    except RuntimeError as e:
+        raise HTTPException(status_code=400, detail=str(e))
+    except Exception as e:
+        raise HTTPException(status_code=422, detail=str(e))
+@app.get("/state", response_model=Observation)
+def state(task_id: str = "task_1"):
+    env = _get_env(task_id)
+    return env.state()
+@app.get("/tasks")
+def list_tasks():
+    return {tid: spec.dict() for tid, spec in TASKS.items()}
+@app.post("/grade", response_model=GraderResult)
+def grade_endpoint(req: GradeRequest):
+    env = _get_env(req.task_id)
+    obs = env.state()
+    result = grade(req.task_id, obs)
+    return result
+@app.get("/openenv.yaml")
+def get_yaml():
+    yaml_path = os.path.join(os.path.dirname(__file__), "openenv.yaml")
+    if os.path.exists(yaml_path):
+        return FileResponse(yaml_path, media_type="text/yaml")
+    return JSONResponse({"error": "openenv.yaml not found"}, status_code=404)
+if __name__ == "__main__":
+    import uvicorn
+    uvicorn.run("server:app", host="0.0.0.0", port=7860, reload=False)