Spaces:
Sleeping
title: Customer Support OpenEnv Environment
emoji: ๐ค
colorFrom: blue
colorTo: green
sdk: docker
tags:
- openenv
- reinforcement-learning
- llm
- customer-support
๐ค Customer Support Agent โ OpenEnv Environment
๐ง Overview
This project implements a real-world customer support simulation environment built using the OpenEnv specification.
It is designed to evaluate and train intelligent agents capable of:
- Understanding noisy and ambiguous user queries
- Classifying issues correctly
- Gathering missing information efficiently
- Resolving tickets under uncertainty
Unlike toy environments, this system models real operational complexity found in production customer support workflows.
๐ฏ Objective
Build and evaluate an agent that can:
- Classify customer issues (billing / technical / delivery)
- Collect required information dynamically
- Resolve efficiently under constraints
- Adapt behavior mid-episode (self-correction)
๐๏ธ System Architecture
+----------------------+ | Customer Ticket | | (noisy, ambiguous) | +----------+-----------+ | v +----------------------+
| Environment (env.py) |
|---|
| - State |
| - Reward |
| - Stochasticity |
| +----------+-----------+ |
|
v
+----------------------+
| Observation Space |
|---|
| message |
| known_info |
| required |
| +----------+-----------+ |
|
v
+----------------------+
| Agent (LLM + Rule) |
|---|
| - Reasoning (LLM) |
| - Constraints |
| - Fallback |
| +----------+-----------+ |
|
v
+----------------------+
| Action |
|---|
| classify |
| ask_info |
| resolve |
| +----------+-----------+ |
|
v
+----------------------+
| Environment Step |
|---|
| reward |
| next_state |
| +----------------------+ |
Interaction Loop
RESET โ OBSERVE โ ACT โ STEP โ REPEAT
Detailed Flow:
[RESET] โ [Observation] โ [Agent Decision] โ [Action] โ [Environment Step] โ [Reward + Next State] โ [Done?] โโ No โโ> Loop โ Yes โ [Episode End]
Self-Correction Loop
Initial Flow: classify โ ask_info โ resolve
With Self-Correction:
classify โ ask_info โ [New Information Arrives] โ re-evaluate decision โ re-classify (if needed) โ ask remaining info โ resolve
Agent Decision Logic
IF not classified: โ classify
ELIF missing required fields: โ ask_info
ELIF uncertain: โ re-classify
ELSE: โ resolve
Stochastic Behavior
Customer Message = base_variant
- noise injection
- ambiguity
Required Info = full_schema
- randomly masked fields
Difficulty Controls: EASY โ low noise, clear signals MEDIUM โ moderate noise HARD โ high ambiguity + missing info
Reward Flow
Action โ Immediate Reward โ Final Outcome
Examples:
ask_info (useful) โ +0.3 repeat ask โ -0.3 step penalty โ -0.05 correct classify โ +0.2 premature resolve โ -1.0 (hard) successful resolve โ +0.2 to +1.0
Example Episode
Step 1: classify โ reward -0.05 Step 2: ask_info โ reward +0.20 Step 3: re-classify โ reward -0.05 Step 4: resolve โ reward +0.45
Outcome: โ success โ self-correction observed โ efficient resolution
1. Environment (env.py)
A stateful, stochastic simulation of customer support operations.
Key Features
- Multi-step interaction loop (
step,reset,state) - Partial observability (missing information)
- Stochastic noise injection
- Difficulty-aware configuration
- Multi-intent ticket handling
- Reward shaping with penalties for poor decisions
2. Observation Space
{
"ticket_id": "string",
"customer_message": "string",
"known_info": {},
"required": ["fields"],
"missing_required": ["fields"],
"info_progress": 0.0,
"status": "open | resolved",
"step_count": 0,
"remaining_steps": 10,
"difficulty": "easy | medium | hard"
}
3. Action Space
| Action | Description |
|---|---|
| classify | Assign category + priority |
| ask_info | Request missing field |
| resolve | Attempt to close ticket |
Example:
{
"type": "ask_info",
"field": "order_id"
}
๐ฒ Difficulty & Stochastic Control
The environment dynamically adjusts complexity:
| Difficulty | Max Steps | Noise | Missing Info |
|---|---|---|---|
| Easy | Low | None | Minimal |
| Medium | Medium | Moderate | Partial |
| Hard | High | High | Significant |
Stochastic Elements
Noise Injection Adds irrelevant or emotional phrases
Information Masking Required fields may be hidden
Ambiguity Messages may not clearly indicate category
๐งพ Dataset (Production-Style Tickets)
Each ticket includes:
{
"ticket_id": "...",
"variants": [...], # multiple phrasings
"noise": [...], # real-world clutter
"ground_truth": {
"category": "...",
"priority": "...",
"required_info": [...],
"intents": [...] # multi-intent support
}
}
Key Properties
- Multiple linguistic variations
- Realistic phrasing (not templated)
- Multi-intent issues (e.g., billing + technical)
- No explicit hints (agent must infer)
๐ Self-Correction Mechanism
The agent is designed to adapt within an episode.
What this means:
- Can re-classify after new information
- Can delay resolution under uncertainty
- Can recover from suboptimal actions
Example behavior:
classify โ ask_info โ re-classify โ resolve
This mimics real-world agent reasoning rather than fixed pipelines.
๐ง Agent Design (agent_llm.py)
Hybrid Intelligence
| Component | Role |
|---|---|
| LLM | High-level reasoning |
| Rules | Safety + constraints |
| Fallback | Deterministic recovery |
Key Capabilities
- Structured JSON output
- Retry + validation loop
- Fallback policy (guarantees progress)
- Partial autonomy (not over-constrained)
๐งฎ Reward Design
Reward is dense and shaped, not binary.
| Behavior | Reward |
|---|---|
| Step penalty | -0.05 |
| Correct classification | +0.2 |
| Useful info collection | +0.3 |
| Redundant action | -0.3 |
| Premature resolve (hard) | -1.0 |
| Successful resolve | +0.2 to +1.0 |
๐ Metrics
Tracked per episode:
{
"success_rate": 0.0,
"avg_steps": 0.0,
"avg_reward": 0.0,
"info_efficiency": 0.0
}
Additional Behavioral Signals
- Self-correction frequency (re-classification)
- Resolution efficiency
- Failure modes under uncertainty
๐งช Tasks & Graders
Three evaluation tasks:
| Task | Difficulty | Objective |
|---|---|---|
| easy-info-collection | Easy | Basic info gathering |
| medium-complete-info | Medium | Complete + accurate handling |
| hard-efficient-resolution | Hard | Efficient resolution under uncertainty |
Grader Properties
Deterministic
Score range: 0.0 โ 1.0
Multi-factor scoring:
- success
- efficiency
- completeness
โถ๏ธ Inference
Run baseline agent:
python inference.py
Outputs:
[START] task=easy-info-collection ...
[STEP] ...
[END] ...
{"task_id": "...", "score": 0.7}
๐ณ Deployment (Hugging Face Spaces)
Build Docker
docker build -t openenv-customer-support-agent .
Run
docker run -p 7860:7860 openenv-customer-support-agent
๐ API Endpoints
| Endpoint | Description |
|---|---|
/reset |
Initialize environment |
/step |
Execute action |
โ๏ธ Environment Variables
Required:
API_BASE_URL
MODEL_NAME
HF_TOKEN
โ OpenEnv Compliance
- Typed observation/action models
- step/reset/state implemented
- 3+ tasks with graders
- Deterministic scoring
- Dockerized deployment
- HF Space compatible
๐ Key Innovations
- Real-world task simulation (not toy)
- Stochastic difficulty scaling
- Multi-intent ticket modeling
- Self-correcting agent behavior
- Hybrid LLM + rule-based architecture
- Dense reward shaping
๐ฎ Future Improvements
- Multi-stage resolution pipelines
- Conversation memory (history utilization)
- Active uncertainty estimation
- Adaptive task generation
- Multi-agent coordination
๐ง Big Picture
This environment models:
Decision-making under uncertainty with partial information
It is suitable for:
- RL agent training
- LLM agent evaluation
- benchmarking reasoning systems
๐ค Author
Built as part of an advanced OpenEnv submission focused on real-world agent intelligence and evaluation.