prashantmatlani commited on
Commit
9619237
ยท
1 Parent(s): 0894e25

updated README.md

Browse files
Files changed (1) hide show
  1. README.md +308 -78
README.md CHANGED
@@ -1,125 +1,355 @@
1
  ---
2
- title: Customer Support Agent
 
3
  emoji: ๐Ÿค–
4
  colorFrom: blue
5
  colorTo: green
6
  sdk: docker
7
  tags:
8
- - openenv
 
 
 
 
 
9
  ---
10
 
11
- # Customer Support RL + LLM Agent โ€” Overview
 
 
12
 
13
- ## Overview
14
- This project implements a hybrid agent for customer support automation.
15
 
16
- The agent:
17
- 1. Classifies customer queries
18
- 2. Collects required information
19
- 3. Resolves efficiently
 
 
 
 
20
 
21
  ---
22
 
23
- ## Environment
24
 
25
- The environment simulates customer support tickets with:
26
- - Customer message
27
- - Required information fields
28
- - Ground truth classification
29
 
30
- The agent uses a hybrid approach:
31
- - LLM for classification
32
- - deterministic policy for information gathering
33
- - reward-shaped environment for optimization
34
 
35
- ๐ŸŽฏ Objective
36
 
37
- Build an intelligent agent that:
38
 
39
- - Classifies customer issues
40
- - Collects required information
41
- - Resolves efficiently
42
 
43
- ๐Ÿ— Architecture
44
 
45
- 1. Environment (env.py)
46
 
47
- Simulates customer support workflow.
 
 
 
 
 
48
 
49
- State:
50
 
51
- customer_message
52
- known_info
53
- required fields
54
- progress
55
 
56
- Actions:
 
 
 
 
 
 
 
 
 
 
 
 
 
57
 
58
- classify
59
- ask_info
60
- resolve
61
 
62
- 2. Reward Design
63
 
64
- Action Reward
65
- Correct classify +0.5
66
- Ask required info +0.3
67
- Repeat ask -0.3
68
- Step penalty -0.05
69
- Successful resolve +1.0
70
 
71
- 3. Observation Design
72
 
 
73
  {
74
- "customer_message": str,
75
- "known_info": dict,
76
- "required": list # full schema
77
  }
 
 
 
78
 
79
- 4. Agent Types
80
 
81
- Rule Agent (agent.py)
82
- . Deterministic
83
- . Uses required fields
84
- . Computes missing info
85
 
86
- LLM Agent (agent_llm.py)
87
- . Uses prompt reasoning
88
- . Strict JSON output
89
- . Retry + fallback
 
90
 
91
- 5. Core Logic
92
 
93
- if not classified:
94
- classify
95
- elif missing fields:
96
- ask_info
97
- else:
98
- resolve
99
 
100
- 6. Key Improvements Made
 
101
 
102
- - Removed ground-truth leakage
103
- - Added reward shaping
104
- - Added efficiency scoring
105
- - Added schema-based reasoning
106
- - Added fallback policy
107
- - Added metrics tracking
108
 
109
- 7. Metrics
 
 
110
 
 
 
 
111
  {
112
- success_rate,
113
- avg_steps,
114
- avg_reward,
115
- info_efficiency
 
 
 
 
 
116
  }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
117
 
118
- 8. Inference
119
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
120
  python inference.py
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
121
 
122
- 9. Deployment
123
 
124
- docker build -t support-agent .
125
- docker run support-agent
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+
3
+ title: Customer Support OpenEnv Environment
4
  emoji: ๐Ÿค–
5
  colorFrom: blue
6
  colorTo: green
7
  sdk: docker
8
  tags:
9
+
10
+ * openenv
11
+ * reinforcement-learning
12
+ * llm
13
+ * customer-support
14
+
15
  ---
16
 
17
+ # ๐Ÿค– Customer Support Agent โ€” OpenEnv Environment
18
+
19
+ ## ๐Ÿง  Overview
20
 
21
+ This project implements a **real-world customer support simulation environment** built using the OpenEnv specification.
 
22
 
23
+ It is designed to evaluate and train intelligent agents capable of:
24
+
25
+ * Understanding noisy and ambiguous user queries
26
+ * Classifying issues correctly
27
+ * Gathering missing information efficiently
28
+ * Resolving tickets under uncertainty
29
+
30
+ Unlike toy environments, this system models **real operational complexity** found in production customer support workflows.
31
 
32
  ---
33
 
34
+ ## ๐ŸŽฏ Objective
35
 
36
+ Build and evaluate an agent that can:
 
 
 
37
 
38
+ 1. **Classify** customer issues (billing / technical / delivery)
39
+ 2. **Collect required information** dynamically
40
+ 3. **Resolve efficiently** under constraints
41
+ 4. **Adapt behavior mid-episode** (self-correction)
42
 
43
+ ---
44
 
45
+ ## ๐Ÿ—๏ธ System Architecture
46
 
47
+ ### 1. Environment (`env.py`)
 
 
48
 
49
+ A **stateful, stochastic simulation** of customer support operations.
50
 
51
+ #### Key Features
52
 
53
+ * Multi-step interaction loop (`step`, `reset`, `state`)
54
+ * Partial observability (missing information)
55
+ * Stochastic noise injection
56
+ * Difficulty-aware configuration
57
+ * Multi-intent ticket handling
58
+ * Reward shaping with penalties for poor decisions
59
 
60
+ ---
61
 
62
+ ### 2. Observation Space
 
 
 
63
 
64
+ ```json
65
+ {
66
+ "ticket_id": "string",
67
+ "customer_message": "string",
68
+ "known_info": {},
69
+ "required": ["fields"],
70
+ "missing_required": ["fields"],
71
+ "info_progress": 0.0,
72
+ "status": "open | resolved",
73
+ "step_count": 0,
74
+ "remaining_steps": 10,
75
+ "difficulty": "easy | medium | hard"
76
+ }
77
+ ```
78
 
79
+ ---
 
 
80
 
81
+ ### 3. Action Space
82
 
83
+ | Action | Description |
84
+ | -------- | -------------------------- |
85
+ | classify | Assign category + priority |
86
+ | ask_info | Request missing field |
87
+ | resolve | Attempt to close ticket |
 
88
 
89
+ Example:
90
 
91
+ ```json
92
  {
93
+ "type": "ask_info",
94
+ "field": "order_id"
 
95
  }
96
+ ```
97
+
98
+ ---
99
 
100
+ ## ๐ŸŽฒ Difficulty & Stochastic Control
101
 
102
+ The environment dynamically adjusts complexity:
 
 
 
103
 
104
+ | Difficulty | Max Steps | Noise | Missing Info |
105
+ | ---------- | --------- | -------- | ------------ |
106
+ | Easy | Low | None | Minimal |
107
+ | Medium | Medium | Moderate | Partial |
108
+ | Hard | High | High | Significant |
109
 
110
+ ### Stochastic Elements
111
 
112
+ * **Noise Injection**
113
+ Adds irrelevant or emotional phrases
 
 
 
 
114
 
115
+ * **Information Masking**
116
+ Required fields may be hidden
117
 
118
+ * **Ambiguity**
119
+ Messages may not clearly indicate category
 
 
 
 
120
 
121
+ ---
122
+
123
+ ## ๐Ÿงพ Dataset (Production-Style Tickets)
124
 
125
+ Each ticket includes:
126
+
127
+ ```python
128
  {
129
+ "ticket_id": "...",
130
+ "variants": [...], # multiple phrasings
131
+ "noise": [...], # real-world clutter
132
+ "ground_truth": {
133
+ "category": "...",
134
+ "priority": "...",
135
+ "required_info": [...],
136
+ "intents": [...] # multi-intent support
137
+ }
138
  }
139
+ ```
140
+
141
+ ### Key Properties
142
+
143
+ * Multiple linguistic variations
144
+ * Realistic phrasing (not templated)
145
+ * Multi-intent issues (e.g., billing + technical)
146
+ * No explicit hints (agent must infer)
147
+
148
+ ---
149
+
150
+ ## ๐Ÿ” Self-Correction Mechanism
151
+
152
+ The agent is designed to **adapt within an episode**.
153
+
154
+ ### What this means:
155
+
156
+ * Can **re-classify after new information**
157
+ * Can **delay resolution under uncertainty**
158
+ * Can **recover from suboptimal actions**
159
+
160
+ ### Example behavior:
161
+
162
+ ```
163
+ classify โ†’ ask_info โ†’ re-classify โ†’ resolve
164
+ ```
165
+
166
+ This mimics real-world agent reasoning rather than fixed pipelines.
167
+
168
+ ---
169
+
170
+ ## ๐Ÿง  Agent Design (`agent_llm.py`)
171
+
172
+ ### Hybrid Intelligence
173
+
174
+ | Component | Role |
175
+ | --------- | ---------------------- |
176
+ | LLM | High-level reasoning |
177
+ | Rules | Safety + constraints |
178
+ | Fallback | Deterministic recovery |
179
+
180
+ ---
181
+
182
+ ### Key Capabilities
183
+
184
+ * Structured JSON output
185
+ * Retry + validation loop
186
+ * Fallback policy (guarantees progress)
187
+ * Partial autonomy (not over-constrained)
188
+
189
+ ---
190
+
191
+ ## ๐Ÿงฎ Reward Design
192
+
193
+ Reward is **dense and shaped**, not binary.
194
+
195
+ | Behavior | Reward |
196
+ | ------------------------ | ------------ |
197
+ | Step penalty | -0.05 |
198
+ | Correct classification | +0.2 |
199
+ | Useful info collection | +0.3 |
200
+ | Redundant action | -0.3 |
201
+ | Premature resolve (hard) | -1.0 |
202
+ | Successful resolve | +0.2 to +1.0 |
203
+
204
+ ---
205
+
206
+ ## ๐Ÿ“Š Metrics
207
+
208
+ Tracked per episode:
209
+
210
+ ```json
211
+ {
212
+ "success_rate": 0.0,
213
+ "avg_steps": 0.0,
214
+ "avg_reward": 0.0,
215
+ "info_efficiency": 0.0
216
+ }
217
+ ```
218
+
219
+ ### Additional Behavioral Signals
220
+
221
+ * Self-correction frequency (re-classification)
222
+ * Resolution efficiency
223
+ * Failure modes under uncertainty
224
+
225
+ ---
226
+
227
+ ## ๐Ÿงช Tasks & Graders
228
+
229
+ Three evaluation tasks:
230
+
231
+ | Task | Difficulty | Objective |
232
+ | ------------------------- | ---------- | -------------------------------------- |
233
+ | easy-info-collection | Easy | Basic info gathering |
234
+ | medium-complete-info | Medium | Complete + accurate handling |
235
+ | hard-efficient-resolution | Hard | Efficient resolution under uncertainty |
236
 
237
+ ### Grader Properties
238
 
239
+ * Deterministic
240
+ * Score range: **0.0 โ€“ 1.0**
241
+ * Multi-factor scoring:
242
+
243
+ * success
244
+ * efficiency
245
+ * completeness
246
+
247
+ ---
248
+
249
+ ## โ–ถ๏ธ Inference
250
+
251
+ Run baseline agent:
252
+
253
+ ```bash
254
  python inference.py
255
+ ```
256
+
257
+ Outputs:
258
+
259
+ ```
260
+ [START] task=easy-info-collection ...
261
+ [STEP] ...
262
+ [END] ...
263
+ {"task_id": "...", "score": 0.7}
264
+ ```
265
+
266
+ ---
267
+
268
+ ## ๐Ÿณ Deployment (Hugging Face Spaces)
269
+
270
+ ### Build Docker
271
+
272
+ ```bash
273
+ docker build -t openenv-customer-support-agent .
274
+ ```
275
+
276
+ ### Run
277
+
278
+ ```bash
279
+ docker run -p 7860:7860 openenv-customer-support-agent
280
+ ```
281
+
282
+ ---
283
+
284
+ ## ๐ŸŒ API Endpoints
285
+
286
+ | Endpoint | Description |
287
+ | -------- | ---------------------- |
288
+ | `/reset` | Initialize environment |
289
+ | `/step` | Execute action |
290
+
291
+ ---
292
+
293
+ ## โš™๏ธ Environment Variables
294
 
295
+ Required:
296
 
297
+ ```
298
+ API_BASE_URL
299
+ MODEL_NAME
300
+ HF_TOKEN
301
+ ```
302
+
303
+ ---
304
+
305
+ ## โœ… OpenEnv Compliance
306
+
307
+ * Typed observation/action models
308
+ * step/reset/state implemented
309
+ * 3+ tasks with graders
310
+ * Deterministic scoring
311
+ * Dockerized deployment
312
+ * HF Space compatible
313
+
314
+ ---
315
+
316
+ ## ๐Ÿš€ Key Innovations
317
+
318
+ * Real-world task simulation (not toy)
319
+ * Stochastic difficulty scaling
320
+ * Multi-intent ticket modeling
321
+ * Self-correcting agent behavior
322
+ * Hybrid LLM + rule-based architecture
323
+ * Dense reward shaping
324
+
325
+ ---
326
+
327
+ ## ๐Ÿ”ฎ Future Improvements
328
+
329
+ * Multi-stage resolution pipelines
330
+ * Conversation memory (history utilization)
331
+ * Active uncertainty estimation
332
+ * Adaptive task generation
333
+ * Multi-agent coordination
334
+
335
+ ---
336
+
337
+ ## ๐Ÿง  Big Picture
338
+
339
+ This environment models:
340
+
341
+ > **Decision-making under uncertainty with partial information**
342
+
343
+ It is suitable for:
344
+
345
+ * RL agent training
346
+ * LLM agent evaluation
347
+ * benchmarking reasoning systems
348
+
349
+ ---
350
+
351
+ ## ๐Ÿ‘ค Author
352
+
353
+ Built as part of an advanced OpenEnv submission focused on real-world agent intelligence and evaluation.
354
+
355
+ ---