csa01 / README.md
prashantmatlani's picture
updated README.md, .yaml
f619d4c
---
title: Customer Support OpenEnv Environment
emoji: ๐Ÿค–
colorFrom: blue
colorTo: green
sdk: docker
tags:
- openenv
- reinforcement-learning
- llm
- customer-support
---
# ๐Ÿค– Customer Support Agent โ€” OpenEnv Environment
## ๐Ÿง  Overview
This project implements a **real-world customer support simulation environment** built using the OpenEnv specification.
It is designed to evaluate and train intelligent agents capable of:
* Understanding noisy and ambiguous user queries
* Classifying issues correctly
* Gathering missing information efficiently
* Resolving tickets under uncertainty
Unlike toy environments, this system models **real operational complexity** found in production customer support workflows.
---
## ๐ŸŽฏ Objective
Build and evaluate an agent that can:
1. **Classify** customer issues (billing / technical / delivery)
2. **Collect required information** dynamically
3. **Resolve efficiently** under constraints
4. **Adapt behavior mid-episode** (self-correction)
---
## ๐Ÿ—๏ธ System Architecture
+----------------------+
| Customer Ticket |
| (noisy, ambiguous) |
+----------+-----------+
|
v
+----------------------+
| Environment (env.py)|
|----------------------|
| - State |
| - Reward |
| - Stochasticity |
+----------+-----------+
|
v
+----------------------+
| Observation Space |
|----------------------|
| message |
| known_info |
| required |
+----------+-----------+
|
v
+----------------------+
| Agent (LLM + Rule) |
|----------------------|
| - Reasoning (LLM) |
| - Constraints |
| - Fallback |
+----------+-----------+
|
v
+----------------------+
| Action |
|----------------------|
| classify |
| ask_info |
| resolve |
+----------+-----------+
|
v
+----------------------+
| Environment Step |
|----------------------|
| reward |
| next_state |
+----------------------+
## Interaction Loop
RESET โ†’ OBSERVE โ†’ ACT โ†’ STEP โ†’ REPEAT
Detailed Flow:
[RESET]
โ†“
[Observation]
โ†“
[Agent Decision]
โ†“
[Action]
โ†“
[Environment Step]
โ†“
[Reward + Next State]
โ†“
[Done?] โ”€โ”€ No โ”€โ”€> Loop
โ”‚
Yes
โ†“
[Episode End]
## Self-Correction Loop
Initial Flow:
classify โ†’ ask_info โ†’ resolve
With Self-Correction:
classify
โ†“
ask_info
โ†“
[New Information Arrives]
โ†“
re-evaluate decision
โ†“
re-classify (if needed)
โ†“
ask remaining info
โ†“
resolve
## Agent Decision Logic
IF not classified:
โ†’ classify
ELIF missing required fields:
โ†’ ask_info
ELIF uncertain:
โ†’ re-classify
ELSE:
โ†’ resolve
## Stochastic Behavior
Customer Message =
base_variant
+ noise injection
+ ambiguity
Required Info =
full_schema
- randomly masked fields
Difficulty Controls:
EASY โ†’ low noise, clear signals
MEDIUM โ†’ moderate noise
HARD โ†’ high ambiguity + missing info
## Reward Flow
Action โ†’ Immediate Reward โ†’ Final Outcome
Examples:
ask_info (useful) โ†’ +0.3
repeat ask โ†’ -0.3
step penalty โ†’ -0.05
correct classify โ†’ +0.2
premature resolve โ†’ -1.0 (hard)
successful resolve โ†’ +0.2 to +1.0
## Example Episode
Step 1: classify โ†’ reward -0.05
Step 2: ask_info โ†’ reward +0.20
Step 3: re-classify โ†’ reward -0.05
Step 4: resolve โ†’ reward +0.45
Outcome:
โœ” success
โœ” self-correction observed
โœ” efficient resolution
### 1. Environment (`env.py`)
A **stateful, stochastic simulation** of customer support operations.
#### Key Features
* Multi-step interaction loop (`step`, `reset`, `state`)
* Partial observability (missing information)
* Stochastic noise injection
* Difficulty-aware configuration
* Multi-intent ticket handling
* Reward shaping with penalties for poor decisions
---
### 2. Observation Space
```json
{
"ticket_id": "string",
"customer_message": "string",
"known_info": {},
"required": ["fields"],
"missing_required": ["fields"],
"info_progress": 0.0,
"status": "open | resolved",
"step_count": 0,
"remaining_steps": 10,
"difficulty": "easy | medium | hard"
}
```
---
### 3. Action Space
| Action | Description |
| -------- | -------------------------- |
| classify | Assign category + priority |
| ask_info | Request missing field |
| resolve | Attempt to close ticket |
Example:
```json
{
"type": "ask_info",
"field": "order_id"
}
```
---
## ๐ŸŽฒ Difficulty & Stochastic Control
The environment dynamically adjusts complexity:
| Difficulty | Max Steps | Noise | Missing Info |
| ---------- | --------- | -------- | ------------ |
| Easy | Low | None | Minimal |
| Medium | Medium | Moderate | Partial |
| Hard | High | High | Significant |
### Stochastic Elements
* **Noise Injection**
Adds irrelevant or emotional phrases
* **Information Masking**
Required fields may be hidden
* **Ambiguity**
Messages may not clearly indicate category
---
## ๐Ÿงพ Dataset (Production-Style Tickets)
Each ticket includes:
```python
{
"ticket_id": "...",
"variants": [...], # multiple phrasings
"noise": [...], # real-world clutter
"ground_truth": {
"category": "...",
"priority": "...",
"required_info": [...],
"intents": [...] # multi-intent support
}
}
```
### Key Properties
* Multiple linguistic variations
* Realistic phrasing (not templated)
* Multi-intent issues (e.g., billing + technical)
* No explicit hints (agent must infer)
---
## ๐Ÿ” Self-Correction Mechanism
The agent is designed to **adapt within an episode**.
### What this means:
* Can **re-classify after new information**
* Can **delay resolution under uncertainty**
* Can **recover from suboptimal actions**
### Example behavior:
```
classify โ†’ ask_info โ†’ re-classify โ†’ resolve
```
This mimics real-world agent reasoning rather than fixed pipelines.
---
## ๐Ÿง  Agent Design (`agent_llm.py`)
### Hybrid Intelligence
| Component | Role |
| --------- | ---------------------- |
| LLM | High-level reasoning |
| Rules | Safety + constraints |
| Fallback | Deterministic recovery |
---
### Key Capabilities
* Structured JSON output
* Retry + validation loop
* Fallback policy (guarantees progress)
* Partial autonomy (not over-constrained)
---
## ๐Ÿงฎ Reward Design
Reward is **dense and shaped**, not binary.
| Behavior | Reward |
| ------------------------ | ------------ |
| Step penalty | -0.05 |
| Correct classification | +0.2 |
| Useful info collection | +0.3 |
| Redundant action | -0.3 |
| Premature resolve (hard) | -1.0 |
| Successful resolve | +0.2 to +1.0 |
---
## ๐Ÿ“Š Metrics
Tracked per episode:
```json
{
"success_rate": 0.0,
"avg_steps": 0.0,
"avg_reward": 0.0,
"info_efficiency": 0.0
}
```
### Additional Behavioral Signals
* Self-correction frequency (re-classification)
* Resolution efficiency
* Failure modes under uncertainty
---
## ๐Ÿงช Tasks & Graders
Three evaluation tasks:
| Task | Difficulty | Objective |
| ------------------------- | ---------- | -------------------------------------- |
| easy-info-collection | Easy | Basic info gathering |
| medium-complete-info | Medium | Complete + accurate handling |
| hard-efficient-resolution | Hard | Efficient resolution under uncertainty |
### Grader Properties
* Deterministic
* Score range: **0.0 โ€“ 1.0**
* Multi-factor scoring:
* success
* efficiency
* completeness
---
## โ–ถ๏ธ Inference
Run baseline agent:
```bash
python inference.py
```
Outputs:
```
[START] task=easy-info-collection ...
[STEP] ...
[END] ...
{"task_id": "...", "score": 0.7}
```
---
## ๐Ÿณ Deployment (Hugging Face Spaces)
### Build Docker
```bash
docker build -t openenv-customer-support-agent .
```
### Run
```bash
docker run -p 7860:7860 openenv-customer-support-agent
```
---
## ๐ŸŒ API Endpoints
| Endpoint | Description |
| -------- | ---------------------- |
| `/reset` | Initialize environment |
| `/step` | Execute action |
---
## โš™๏ธ Environment Variables
Required:
```
API_BASE_URL
MODEL_NAME
HF_TOKEN
```
---
## โœ… OpenEnv Compliance
* Typed observation/action models
* step/reset/state implemented
* 3+ tasks with graders
* Deterministic scoring
* Dockerized deployment
* HF Space compatible
---
## ๐Ÿš€ Key Innovations
* Real-world task simulation (not toy)
* Stochastic difficulty scaling
* Multi-intent ticket modeling
* Self-correcting agent behavior
* Hybrid LLM + rule-based architecture
* Dense reward shaping
---
## ๐Ÿ”ฎ Future Improvements
* Multi-stage resolution pipelines
* Conversation memory (history utilization)
* Active uncertainty estimation
* Adaptive task generation
* Multi-agent coordination
---
## ๐Ÿง  Big Picture
This environment models:
> **Decision-making under uncertainty with partial information**
It is suitable for:
* RL agent training
* LLM agent evaluation
* benchmarking reasoning systems
---
## ๐Ÿ‘ค Author
Built as part of an advanced OpenEnv submission focused on real-world agent intelligence and evaluation.
---