Spaces:
Runtime error
Runtime error
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,14 +1,4 @@
|
|
| 1 |
-
# CustomerSupportEnv
|
| 2 |
-
|
| 3 |
-
> An OpenEnv-compatible reinforcement learning environment for training and evaluating AI customer support agents.
|
| 4 |
-
|
| 5 |
-
[](openenv.yaml)
|
| 6 |
-
[](https://huggingface.co/spaces)
|
| 7 |
-
[](Dockerfile)
|
| 8 |
-
|
| 9 |
-
---
|
| 10 |
---
|
| 11 |
-
|
| 12 |
title: CustomerSupportEnv
|
| 13 |
emoji: 🎧
|
| 14 |
colorFrom: blue
|
|
@@ -16,265 +6,6 @@ colorTo: indigo
|
|
| 16 |
sdk: docker
|
| 17 |
app_file: server.py
|
| 18 |
pinned: false
|
| 19 |
-
tags:
|
| 20 |
-
|
| 21 |
-
* openenv
|
| 22 |
-
* reinforcement-learning
|
| 23 |
-
* customer-support
|
| 24 |
-
* nlp
|
| 25 |
-
|
| 26 |
---
|
| 27 |
|
| 28 |
-
#
|
| 29 |
-
|
| 30 |
-
**CustomerSupportEnv** simulates a real-world Tier-1 customer support workflow. An agent handles inbound support tickets by searching a knowledge base, empathising with customers, asking clarifying questions, and delivering concrete solutions — all within a multi-turn conversation.
|
| 31 |
-
|
| 32 |
-
This environment is designed for:
|
| 33 |
-
- Training RL agents on real-world NLP tasks
|
| 34 |
-
- Benchmarking LLM-based tool-use and retrieval-augmented reasoning
|
| 35 |
-
- Evaluating customer satisfaction optimisation policies
|
| 36 |
-
|
| 37 |
-
---
|
| 38 |
-
|
| 39 |
-
## Quick Start
|
| 40 |
-
|
| 41 |
-
### Docker (recommended)
|
| 42 |
-
```bash
|
| 43 |
-
git clone https://huggingface.co/spaces/<your-username>/customer-support-env
|
| 44 |
-
cd customer-support-env
|
| 45 |
-
docker build -t customer-support-env .
|
| 46 |
-
docker run -p 7860:7860 customer-support-env
|
| 47 |
-
```
|
| 48 |
-
|
| 49 |
-
### Local
|
| 50 |
-
```bash
|
| 51 |
-
pip install -r requirements.txt
|
| 52 |
-
uvicorn server:app --host 0.0.0.0 --port 7860
|
| 53 |
-
```
|
| 54 |
-
|
| 55 |
-
### Run baseline inference
|
| 56 |
-
```bash
|
| 57 |
-
export API_BASE_URL=https://api.openai.com/v1
|
| 58 |
-
export MODEL_NAME=gpt-4o-mini
|
| 59 |
-
export HF_TOKEN=sk-...
|
| 60 |
-
python inference.py
|
| 61 |
-
```
|
| 62 |
-
|
| 63 |
-
---
|
| 64 |
-
|
| 65 |
-
## Environment Description
|
| 66 |
-
|
| 67 |
-
Each **episode** = one customer support ticket. The agent takes a sequence of actions (turns) until it calls `resolve()` or exceeds `max_turns`.
|
| 68 |
-
|
| 69 |
-
### Real-world fidelity
|
| 70 |
-
- Tickets span 5 categories: **auth**, **billing**, **fulfillment**, **bug**, **sales**
|
| 71 |
-
- Customers have dynamic sentiment: **positive / neutral / frustrated / angry**
|
| 72 |
-
- Knowledge base retrieval is gated — agent must explicitly call `search_kb`
|
| 73 |
-
- Conversation history accumulates across turns, mirroring real support tooling
|
| 74 |
-
- CSAT (customer satisfaction) is a synthetic secondary objective
|
| 75 |
-
|
| 76 |
-
---
|
| 77 |
-
|
| 78 |
-
## OpenEnv API
|
| 79 |
-
|
| 80 |
-
### `POST /reset`
|
| 81 |
-
```json
|
| 82 |
-
{ "task_id": "task_1" }
|
| 83 |
-
```
|
| 84 |
-
Returns an `Observation`. Initialises a fresh episode.
|
| 85 |
-
|
| 86 |
-
### `POST /step`
|
| 87 |
-
```json
|
| 88 |
-
{ "task_id": "task_1", "action_type": "search_kb", "payload": null }
|
| 89 |
-
```
|
| 90 |
-
Returns a `StepResult` containing `observation`, `reward`, `done`, `info`.
|
| 91 |
-
|
| 92 |
-
### `GET /state?task_id=task_1`
|
| 93 |
-
Returns the current `Observation` without advancing the environment.
|
| 94 |
-
|
| 95 |
-
### `POST /grade`
|
| 96 |
-
```json
|
| 97 |
-
{ "task_id": "task_1" }
|
| 98 |
-
```
|
| 99 |
-
Returns a `GraderResult` with score (0.0–1.0), breakdown, and pass/fail.
|
| 100 |
-
|
| 101 |
-
### `GET /tasks`
|
| 102 |
-
Lists all task specs.
|
| 103 |
-
|
| 104 |
-
### `GET /health`
|
| 105 |
-
Returns `{"status": "ok"}`.
|
| 106 |
-
|
| 107 |
-
---
|
| 108 |
-
|
| 109 |
-
## Observation Space
|
| 110 |
-
|
| 111 |
-
| Field | Type | Description |
|
| 112 |
-
|-------|------|-------------|
|
| 113 |
-
| `ticket_id` | string | Ticket identifier (e.g. `TKT-001`) |
|
| 114 |
-
| `task_id` | string | Active task (`task_1` / `task_2` / `task_3`) |
|
| 115 |
-
| `status` | enum | `idle` \| `open` \| `resolved` \| `escalated` \| `timeout` |
|
| 116 |
-
| `sentiment` | enum | `positive` \| `neutral` \| `frustrated` \| `angry` |
|
| 117 |
-
| `priority` | enum | `low` \| `medium` \| `high` \| `urgent` |
|
| 118 |
-
| `category` | enum | `auth` \| `billing` \| `fulfillment` \| `bug` \| `sales` |
|
| 119 |
-
| `turn` | int | Current turn number |
|
| 120 |
-
| `max_turns` | int | Maximum turns before timeout |
|
| 121 |
-
| `history` | Message[] | Full conversation: `{role, text, turn}` |
|
| 122 |
-
| `kb_results` | string[] | KB articles retrieved (empty until `search_kb` called) |
|
| 123 |
-
| `kb_searched` | bool | Whether KB has been consulted |
|
| 124 |
-
| `empathized` | bool | Whether agent expressed empathy |
|
| 125 |
-
| `clarified` | bool | Whether agent asked a clarifying question |
|
| 126 |
-
| `solution_offered` | bool | Whether a solution has been offered |
|
| 127 |
-
| `escalated` | bool | Whether ticket was escalated |
|
| 128 |
-
| `cumulative_reward` | float | Running total reward |
|
| 129 |
-
| `done` | bool | Episode termination flag |
|
| 130 |
-
|
| 131 |
-
---
|
| 132 |
-
|
| 133 |
-
## Action Space
|
| 134 |
-
|
| 135 |
-
| Action | Payload | Reward | Notes |
|
| 136 |
-
|--------|---------|--------|-------|
|
| 137 |
-
| `search_kb` | — | **+2.0** | Retrieves KB articles for this ticket's category. Penalty −1.0 on duplicate. |
|
| 138 |
-
| `empathize` | — | **+1.0** | Acknowledges customer frustration. Zero reward on repeat. |
|
| 139 |
-
| `ask_clarify` | question text | **+1.0** | Requests more detail. Zero reward on repeat. |
|
| 140 |
-
| `offer_solution` | solution text | **+3.0 × quality** | Solution is scored against expected keywords. Penalty −1.0 if KB not searched first. |
|
| 141 |
-
| `escalate` | — | **−1.0** | Transfers to tier-2. Penalised to incentivise in-tier resolution. |
|
| 142 |
-
| `resolve` | — | **+5.0 + CSAT×2** | Ends episode. Penalty −3.0 if no solution offered. |
|
| 143 |
-
| `send_message` | message text | **+0.5** | Generic message. Useful for multi-turn clarification. |
|
| 144 |
-
|
| 145 |
-
### Reward decomposition
|
| 146 |
-
Every `Reward` object includes:
|
| 147 |
-
- `total` — net step reward
|
| 148 |
-
- `process_score` — correct action sequencing (0–1)
|
| 149 |
-
- `quality_score` — solution quality (0–1)
|
| 150 |
-
- `efficiency_score` — steps taken vs. optimal (0–1)
|
| 151 |
-
- `csat_score` — synthetic customer satisfaction (0–1)
|
| 152 |
-
- `penalties` — total penalties this step
|
| 153 |
-
|
| 154 |
-
---
|
| 155 |
-
|
| 156 |
-
## Tasks
|
| 157 |
-
|
| 158 |
-
### Task 1 — Easy: Resolve a Standard Auth Ticket
|
| 159 |
-
- **Ticket**: TKT-001 (account lockout, frustrated customer)
|
| 160 |
-
- **Max turns**: 8
|
| 161 |
-
- **Optimal policy**: `search_kb → empathize → offer_solution → resolve`
|
| 162 |
-
- **Max reward**: ~11.0
|
| 163 |
-
- **Grader weights**: KB searched (0.30), empathy (0.25), solution quality (0.25), resolved (0.20)
|
| 164 |
-
|
| 165 |
-
### Task 2 — Medium: Handle a Billing Dispute
|
| 166 |
-
- **Ticket**: TKT-003 (wrong invoice amount after plan downgrade)
|
| 167 |
-
- **Max turns**: 10
|
| 168 |
-
- **Optimal policy**: `search_kb → ask_clarify → empathize → offer_solution → resolve`
|
| 169 |
-
- **Challenge**: Generic solutions penalised; agent must cite a specific dollar credit.
|
| 170 |
-
- **Grader weights**: clarify (0.20), KB (0.20), solution quality (0.30), empathy (0.15), resolved (0.15)
|
| 171 |
-
|
| 172 |
-
### Task 3 — Hard: Triage a Critical Time-Sensitive Bug
|
| 173 |
-
- **Ticket**: TKT-006 (data export stuck, compliance deadline tomorrow)
|
| 174 |
-
- **Max turns**: 8
|
| 175 |
-
- **Optimal policy**: `search_kb → empathize → ask_clarify → offer_solution → resolve`
|
| 176 |
-
- **Challenge**: Two-part solution required (priority queue + partial export). Escalation is capped. Score requires urgency awareness.
|
| 177 |
-
- **Grader weights**: KB (0.20), empathy (0.15), two-part solution (0.35), no escalation (0.15), resolved (0.15)
|
| 178 |
-
|
| 179 |
-
---
|
| 180 |
-
|
| 181 |
-
## Reward Function Design
|
| 182 |
-
|
| 183 |
-
The reward function encodes three business objectives simultaneously:
|
| 184 |
-
|
| 185 |
-
1. **Resolution quality** — `offer_solution` reward scales with solution quality score (keyword matching against canonical solution). Forces the agent to consult the KB before improvising.
|
| 186 |
-
|
| 187 |
-
2. **Process compliance** — Action sequencing is rewarded and penalised: searching KB first, empathising with high-sentiment customers, clarifying ambiguities before offering solutions.
|
| 188 |
-
|
| 189 |
-
3. **Customer experience** — The CSAT bonus on `resolve` (up to +2.0) creates a secondary objective that rewards empathetic, knowledge-grounded interactions even when the base resolution is correct.
|
| 190 |
-
|
| 191 |
-
### Shaped vs. sparse
|
| 192 |
-
Reward is **dense** — every action produces a signal. The agent never needs to reach `resolve` to receive useful gradient. This allows value-function methods to learn efficient policies from incomplete trajectories.
|
| 193 |
-
|
| 194 |
-
---
|
| 195 |
-
|
| 196 |
-
## Grader Specification
|
| 197 |
-
|
| 198 |
-
All graders are **deterministic**: identical observations produce identical scores.
|
| 199 |
-
|
| 200 |
-
- Scores are in `[0.0, 1.0]`
|
| 201 |
-
- Each grader inspects the final `Observation`: flags (`kb_searched`, `empathized`, `clarified`, `solution_offered`, `escalated`, `status`) and conversation `history`
|
| 202 |
-
- Solution quality is measured by keyword presence in agent turn text
|
| 203 |
-
- **Pass threshold**: ≥ 0.70 on all tasks
|
| 204 |
-
|
| 205 |
-
---
|
| 206 |
-
|
| 207 |
-
## Baseline Scores
|
| 208 |
-
|
| 209 |
-
| Task | Difficulty | Model | Grader Score | Passed |
|
| 210 |
-
|------|-----------|-------|-------------|--------|
|
| 211 |
-
| task_1 | easy | gpt-4o-mini | 0.85 | ✓ |
|
| 212 |
-
| task_2 | medium | gpt-4o-mini | 0.78 | ✓ |
|
| 213 |
-
| task_3 | hard | gpt-4o-mini | 0.65 | — |
|
| 214 |
-
| **avg** | | | **0.76** | |
|
| 215 |
-
|
| 216 |
-
---
|
| 217 |
-
|
| 218 |
-
## Project Structure
|
| 219 |
-
|
| 220 |
-
```
|
| 221 |
-
customer_support_env/
|
| 222 |
-
├── server.py # FastAPI app — /reset, /step, /state, /grade
|
| 223 |
-
├── inference.py # Baseline inference script (OpenAI client)
|
| 224 |
-
├── openenv.yaml # OpenEnv spec file
|
| 225 |
-
├── requirements.txt
|
| 226 |
-
├── Dockerfile
|
| 227 |
-
├── README.md
|
| 228 |
-
├── env/
|
| 229 |
-
│ ├── __init__.py
|
| 230 |
-
│ ├── models.py # Typed Pydantic models: Observation, Action, Reward
|
| 231 |
-
│ ├── environment.py # Core CustomerSupportEnv class
|
| 232 |
-
│ └── tickets.py # Ticket scenario database (6 tickets, KB articles)
|
| 233 |
-
├── graders/
|
| 234 |
-
│ ├── __init__.py
|
| 235 |
-
│ └── graders.py # Programmatic graders for all 3 tasks
|
| 236 |
-
└── tests/
|
| 237 |
-
├── __init__.py
|
| 238 |
-
└── test_env.py # 25 unit tests
|
| 239 |
-
```
|
| 240 |
-
|
| 241 |
-
---
|
| 242 |
-
|
| 243 |
-
## Running Tests
|
| 244 |
-
|
| 245 |
-
```bash
|
| 246 |
-
pytest tests/ -v
|
| 247 |
-
```
|
| 248 |
-
|
| 249 |
-
Or without pytest:
|
| 250 |
-
```bash
|
| 251 |
-
python -m tests.test_env
|
| 252 |
-
```
|
| 253 |
-
|
| 254 |
-
---
|
| 255 |
-
|
| 256 |
-
## Hugging Face Space Configuration
|
| 257 |
-
|
| 258 |
-
Add the following to the top of `README.md` for HF Spaces auto-detection:
|
| 259 |
-
|
| 260 |
-
```yaml
|
| 261 |
-
---
|
| 262 |
-
title: CustomerSupportEnv
|
| 263 |
-
emoji: 🎧
|
| 264 |
-
colorFrom: blue
|
| 265 |
-
colorTo: indigo
|
| 266 |
-
sdk: docker
|
| 267 |
-
pinned: false
|
| 268 |
-
tags:
|
| 269 |
-
- openenv
|
| 270 |
-
- reinforcement-learning
|
| 271 |
-
- customer-support
|
| 272 |
-
- nlp
|
| 273 |
-
---
|
| 274 |
-
```
|
| 275 |
-
|
| 276 |
-
---
|
| 277 |
-
|
| 278 |
-
## License
|
| 279 |
-
|
| 280 |
-
MIT
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
title: CustomerSupportEnv
|
| 3 |
emoji: 🎧
|
| 4 |
colorFrom: blue
|
|
|
|
| 6 |
sdk: docker
|
| 7 |
app_file: server.py
|
| 8 |
pinned: false
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
---
|
| 10 |
|
| 11 |
+
# CustomerSupportEnv
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|