Sayed223 commited on
Commit
5bf4915
·
verified ·
1 Parent(s): 81a90be

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -270
README.md CHANGED
@@ -1,14 +1,4 @@
1
- # CustomerSupportEnv
2
-
3
- > An OpenEnv-compatible reinforcement learning environment for training and evaluating AI customer support agents.
4
-
5
- [![OpenEnv](https://img.shields.io/badge/OpenEnv-1.0.0-blue)](openenv.yaml)
6
- [![HF Spaces](https://img.shields.io/badge/HuggingFace-Spaces-yellow)](https://huggingface.co/spaces)
7
- [![Docker](https://img.shields.io/badge/Docker-ready-brightgreen)](Dockerfile)
8
-
9
- ---
10
  ---
11
-
12
  title: CustomerSupportEnv
13
  emoji: 🎧
14
  colorFrom: blue
@@ -16,265 +6,6 @@ colorTo: indigo
16
  sdk: docker
17
  app_file: server.py
18
  pinned: false
19
- tags:
20
-
21
- * openenv
22
- * reinforcement-learning
23
- * customer-support
24
- * nlp
25
-
26
  ---
27
 
28
- ## Overview
29
-
30
- **CustomerSupportEnv** simulates a real-world Tier-1 customer support workflow. An agent handles inbound support tickets by searching a knowledge base, empathising with customers, asking clarifying questions, and delivering concrete solutions — all within a multi-turn conversation.
31
-
32
- This environment is designed for:
33
- - Training RL agents on real-world NLP tasks
34
- - Benchmarking LLM-based tool-use and retrieval-augmented reasoning
35
- - Evaluating customer satisfaction optimisation policies
36
-
37
- ---
38
-
39
- ## Quick Start
40
-
41
- ### Docker (recommended)
42
- ```bash
43
- git clone https://huggingface.co/spaces/<your-username>/customer-support-env
44
- cd customer-support-env
45
- docker build -t customer-support-env .
46
- docker run -p 7860:7860 customer-support-env
47
- ```
48
-
49
- ### Local
50
- ```bash
51
- pip install -r requirements.txt
52
- uvicorn server:app --host 0.0.0.0 --port 7860
53
- ```
54
-
55
- ### Run baseline inference
56
- ```bash
57
- export API_BASE_URL=https://api.openai.com/v1
58
- export MODEL_NAME=gpt-4o-mini
59
- export HF_TOKEN=sk-...
60
- python inference.py
61
- ```
62
-
63
- ---
64
-
65
- ## Environment Description
66
-
67
- Each **episode** = one customer support ticket. The agent takes a sequence of actions (turns) until it calls `resolve()` or exceeds `max_turns`.
68
-
69
- ### Real-world fidelity
70
- - Tickets span 5 categories: **auth**, **billing**, **fulfillment**, **bug**, **sales**
71
- - Customers have dynamic sentiment: **positive / neutral / frustrated / angry**
72
- - Knowledge base retrieval is gated — agent must explicitly call `search_kb`
73
- - Conversation history accumulates across turns, mirroring real support tooling
74
- - CSAT (customer satisfaction) is a synthetic secondary objective
75
-
76
- ---
77
-
78
- ## OpenEnv API
79
-
80
- ### `POST /reset`
81
- ```json
82
- { "task_id": "task_1" }
83
- ```
84
- Returns an `Observation`. Initialises a fresh episode.
85
-
86
- ### `POST /step`
87
- ```json
88
- { "task_id": "task_1", "action_type": "search_kb", "payload": null }
89
- ```
90
- Returns a `StepResult` containing `observation`, `reward`, `done`, `info`.
91
-
92
- ### `GET /state?task_id=task_1`
93
- Returns the current `Observation` without advancing the environment.
94
-
95
- ### `POST /grade`
96
- ```json
97
- { "task_id": "task_1" }
98
- ```
99
- Returns a `GraderResult` with score (0.0–1.0), breakdown, and pass/fail.
100
-
101
- ### `GET /tasks`
102
- Lists all task specs.
103
-
104
- ### `GET /health`
105
- Returns `{"status": "ok"}`.
106
-
107
- ---
108
-
109
- ## Observation Space
110
-
111
- | Field | Type | Description |
112
- |-------|------|-------------|
113
- | `ticket_id` | string | Ticket identifier (e.g. `TKT-001`) |
114
- | `task_id` | string | Active task (`task_1` / `task_2` / `task_3`) |
115
- | `status` | enum | `idle` \| `open` \| `resolved` \| `escalated` \| `timeout` |
116
- | `sentiment` | enum | `positive` \| `neutral` \| `frustrated` \| `angry` |
117
- | `priority` | enum | `low` \| `medium` \| `high` \| `urgent` |
118
- | `category` | enum | `auth` \| `billing` \| `fulfillment` \| `bug` \| `sales` |
119
- | `turn` | int | Current turn number |
120
- | `max_turns` | int | Maximum turns before timeout |
121
- | `history` | Message[] | Full conversation: `{role, text, turn}` |
122
- | `kb_results` | string[] | KB articles retrieved (empty until `search_kb` called) |
123
- | `kb_searched` | bool | Whether KB has been consulted |
124
- | `empathized` | bool | Whether agent expressed empathy |
125
- | `clarified` | bool | Whether agent asked a clarifying question |
126
- | `solution_offered` | bool | Whether a solution has been offered |
127
- | `escalated` | bool | Whether ticket was escalated |
128
- | `cumulative_reward` | float | Running total reward |
129
- | `done` | bool | Episode termination flag |
130
-
131
- ---
132
-
133
- ## Action Space
134
-
135
- | Action | Payload | Reward | Notes |
136
- |--------|---------|--------|-------|
137
- | `search_kb` | — | **+2.0** | Retrieves KB articles for this ticket's category. Penalty −1.0 on duplicate. |
138
- | `empathize` | — | **+1.0** | Acknowledges customer frustration. Zero reward on repeat. |
139
- | `ask_clarify` | question text | **+1.0** | Requests more detail. Zero reward on repeat. |
140
- | `offer_solution` | solution text | **+3.0 × quality** | Solution is scored against expected keywords. Penalty −1.0 if KB not searched first. |
141
- | `escalate` | — | **−1.0** | Transfers to tier-2. Penalised to incentivise in-tier resolution. |
142
- | `resolve` | — | **+5.0 + CSAT×2** | Ends episode. Penalty −3.0 if no solution offered. |
143
- | `send_message` | message text | **+0.5** | Generic message. Useful for multi-turn clarification. |
144
-
145
- ### Reward decomposition
146
- Every `Reward` object includes:
147
- - `total` — net step reward
148
- - `process_score` — correct action sequencing (0–1)
149
- - `quality_score` — solution quality (0–1)
150
- - `efficiency_score` — steps taken vs. optimal (0–1)
151
- - `csat_score` — synthetic customer satisfaction (0–1)
152
- - `penalties` — total penalties this step
153
-
154
- ---
155
-
156
- ## Tasks
157
-
158
- ### Task 1 — Easy: Resolve a Standard Auth Ticket
159
- - **Ticket**: TKT-001 (account lockout, frustrated customer)
160
- - **Max turns**: 8
161
- - **Optimal policy**: `search_kb → empathize → offer_solution → resolve`
162
- - **Max reward**: ~11.0
163
- - **Grader weights**: KB searched (0.30), empathy (0.25), solution quality (0.25), resolved (0.20)
164
-
165
- ### Task 2 — Medium: Handle a Billing Dispute
166
- - **Ticket**: TKT-003 (wrong invoice amount after plan downgrade)
167
- - **Max turns**: 10
168
- - **Optimal policy**: `search_kb → ask_clarify → empathize → offer_solution → resolve`
169
- - **Challenge**: Generic solutions penalised; agent must cite a specific dollar credit.
170
- - **Grader weights**: clarify (0.20), KB (0.20), solution quality (0.30), empathy (0.15), resolved (0.15)
171
-
172
- ### Task 3 — Hard: Triage a Critical Time-Sensitive Bug
173
- - **Ticket**: TKT-006 (data export stuck, compliance deadline tomorrow)
174
- - **Max turns**: 8
175
- - **Optimal policy**: `search_kb → empathize → ask_clarify → offer_solution → resolve`
176
- - **Challenge**: Two-part solution required (priority queue + partial export). Escalation is capped. Score requires urgency awareness.
177
- - **Grader weights**: KB (0.20), empathy (0.15), two-part solution (0.35), no escalation (0.15), resolved (0.15)
178
-
179
- ---
180
-
181
- ## Reward Function Design
182
-
183
- The reward function encodes three business objectives simultaneously:
184
-
185
- 1. **Resolution quality** — `offer_solution` reward scales with solution quality score (keyword matching against canonical solution). Forces the agent to consult the KB before improvising.
186
-
187
- 2. **Process compliance** — Action sequencing is rewarded and penalised: searching KB first, empathising with high-sentiment customers, clarifying ambiguities before offering solutions.
188
-
189
- 3. **Customer experience** — The CSAT bonus on `resolve` (up to +2.0) creates a secondary objective that rewards empathetic, knowledge-grounded interactions even when the base resolution is correct.
190
-
191
- ### Shaped vs. sparse
192
- Reward is **dense** — every action produces a signal. The agent never needs to reach `resolve` to receive useful gradient. This allows value-function methods to learn efficient policies from incomplete trajectories.
193
-
194
- ---
195
-
196
- ## Grader Specification
197
-
198
- All graders are **deterministic**: identical observations produce identical scores.
199
-
200
- - Scores are in `[0.0, 1.0]`
201
- - Each grader inspects the final `Observation`: flags (`kb_searched`, `empathized`, `clarified`, `solution_offered`, `escalated`, `status`) and conversation `history`
202
- - Solution quality is measured by keyword presence in agent turn text
203
- - **Pass threshold**: ≥ 0.70 on all tasks
204
-
205
- ---
206
-
207
- ## Baseline Scores
208
-
209
- | Task | Difficulty | Model | Grader Score | Passed |
210
- |------|-----------|-------|-------------|--------|
211
- | task_1 | easy | gpt-4o-mini | 0.85 | ✓ |
212
- | task_2 | medium | gpt-4o-mini | 0.78 | ✓ |
213
- | task_3 | hard | gpt-4o-mini | 0.65 | — |
214
- | **avg** | | | **0.76** | |
215
-
216
- ---
217
-
218
- ## Project Structure
219
-
220
- ```
221
- customer_support_env/
222
- ├── server.py # FastAPI app — /reset, /step, /state, /grade
223
- ├── inference.py # Baseline inference script (OpenAI client)
224
- ├── openenv.yaml # OpenEnv spec file
225
- ├── requirements.txt
226
- ├── Dockerfile
227
- ├── README.md
228
- ├── env/
229
- │ ├── __init__.py
230
- │ ├── models.py # Typed Pydantic models: Observation, Action, Reward
231
- │ ├── environment.py # Core CustomerSupportEnv class
232
- │ └── tickets.py # Ticket scenario database (6 tickets, KB articles)
233
- ├── graders/
234
- │ ├── __init__.py
235
- │ └── graders.py # Programmatic graders for all 3 tasks
236
- └── tests/
237
- ├── __init__.py
238
- └── test_env.py # 25 unit tests
239
- ```
240
-
241
- ---
242
-
243
- ## Running Tests
244
-
245
- ```bash
246
- pytest tests/ -v
247
- ```
248
-
249
- Or without pytest:
250
- ```bash
251
- python -m tests.test_env
252
- ```
253
-
254
- ---
255
-
256
- ## Hugging Face Space Configuration
257
-
258
- Add the following to the top of `README.md` for HF Spaces auto-detection:
259
-
260
- ```yaml
261
- ---
262
- title: CustomerSupportEnv
263
- emoji: 🎧
264
- colorFrom: blue
265
- colorTo: indigo
266
- sdk: docker
267
- pinned: false
268
- tags:
269
- - openenv
270
- - reinforcement-learning
271
- - customer-support
272
- - nlp
273
- ---
274
- ```
275
-
276
- ---
277
-
278
- ## License
279
-
280
- MIT
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  title: CustomerSupportEnv
3
  emoji: 🎧
4
  colorFrom: blue
 
6
  sdk: docker
7
  app_file: server.py
8
  pinned: false
 
 
 
 
 
 
 
9
  ---
10
 
11
+ # CustomerSupportEnv