Spaces:
Sleeping
Sleeping
| title: Customer Support OpenEnv Environment | |
| emoji: ๐ค | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: docker | |
| tags: | |
| - openenv | |
| - reinforcement-learning | |
| - llm | |
| - customer-support | |
| # ๐ค Customer Support Agent โ OpenEnv Environment | |
| ## ๐ง Overview | |
| This project implements a **real-world customer support simulation environment** built using the OpenEnv specification. | |
| It is designed to evaluate and train intelligent agents capable of: | |
| * Understanding noisy and ambiguous user queries | |
| * Classifying issues correctly | |
| * Gathering missing information efficiently | |
| * Resolving tickets under uncertainty | |
| Unlike toy environments, this system models **real operational complexity** found in production customer support workflows. | |
| --- | |
| ## ๐ฏ Objective | |
| Build and evaluate an agent that can: | |
| 1. **Classify** customer issues (billing / technical / delivery) | |
| 2. **Collect required information** dynamically | |
| 3. **Resolve efficiently** under constraints | |
| 4. **Adapt behavior mid-episode** (self-correction) | |
| --- | |
| ## ๐๏ธ System Architecture | |
| +----------------------+ | |
| | Customer Ticket | | |
| | (noisy, ambiguous) | | |
| +----------+-----------+ | |
| | | |
| v | |
| +----------------------+ | |
| | Environment (env.py)| | |
| |----------------------| | |
| | - State | | |
| | - Reward | | |
| | - Stochasticity | | |
| +----------+-----------+ | |
| | | |
| v | |
| +----------------------+ | |
| | Observation Space | | |
| |----------------------| | |
| | message | | |
| | known_info | | |
| | required | | |
| +----------+-----------+ | |
| | | |
| v | |
| +----------------------+ | |
| | Agent (LLM + Rule) | | |
| |----------------------| | |
| | - Reasoning (LLM) | | |
| | - Constraints | | |
| | - Fallback | | |
| +----------+-----------+ | |
| | | |
| v | |
| +----------------------+ | |
| | Action | | |
| |----------------------| | |
| | classify | | |
| | ask_info | | |
| | resolve | | |
| +----------+-----------+ | |
| | | |
| v | |
| +----------------------+ | |
| | Environment Step | | |
| |----------------------| | |
| | reward | | |
| | next_state | | |
| +----------------------+ | |
| ## Interaction Loop | |
| RESET โ OBSERVE โ ACT โ STEP โ REPEAT | |
| Detailed Flow: | |
| [RESET] | |
| โ | |
| [Observation] | |
| โ | |
| [Agent Decision] | |
| โ | |
| [Action] | |
| โ | |
| [Environment Step] | |
| โ | |
| [Reward + Next State] | |
| โ | |
| [Done?] โโ No โโ> Loop | |
| โ | |
| Yes | |
| โ | |
| [Episode End] | |
| ## Self-Correction Loop | |
| Initial Flow: | |
| classify โ ask_info โ resolve | |
| With Self-Correction: | |
| classify | |
| โ | |
| ask_info | |
| โ | |
| [New Information Arrives] | |
| โ | |
| re-evaluate decision | |
| โ | |
| re-classify (if needed) | |
| โ | |
| ask remaining info | |
| โ | |
| resolve | |
| ## Agent Decision Logic | |
| IF not classified: | |
| โ classify | |
| ELIF missing required fields: | |
| โ ask_info | |
| ELIF uncertain: | |
| โ re-classify | |
| ELSE: | |
| โ resolve | |
| ## Stochastic Behavior | |
| Customer Message = | |
| base_variant | |
| + noise injection | |
| + ambiguity | |
| Required Info = | |
| full_schema | |
| - randomly masked fields | |
| Difficulty Controls: | |
| EASY โ low noise, clear signals | |
| MEDIUM โ moderate noise | |
| HARD โ high ambiguity + missing info | |
| ## Reward Flow | |
| Action โ Immediate Reward โ Final Outcome | |
| Examples: | |
| ask_info (useful) โ +0.3 | |
| repeat ask โ -0.3 | |
| step penalty โ -0.05 | |
| correct classify โ +0.2 | |
| premature resolve โ -1.0 (hard) | |
| successful resolve โ +0.2 to +1.0 | |
| ## Example Episode | |
| Step 1: classify โ reward -0.05 | |
| Step 2: ask_info โ reward +0.20 | |
| Step 3: re-classify โ reward -0.05 | |
| Step 4: resolve โ reward +0.45 | |
| Outcome: | |
| โ success | |
| โ self-correction observed | |
| โ efficient resolution | |
| ### 1. Environment (`env.py`) | |
| A **stateful, stochastic simulation** of customer support operations. | |
| #### Key Features | |
| * Multi-step interaction loop (`step`, `reset`, `state`) | |
| * Partial observability (missing information) | |
| * Stochastic noise injection | |
| * Difficulty-aware configuration | |
| * Multi-intent ticket handling | |
| * Reward shaping with penalties for poor decisions | |
| --- | |
| ### 2. Observation Space | |
| ```json | |
| { | |
| "ticket_id": "string", | |
| "customer_message": "string", | |
| "known_info": {}, | |
| "required": ["fields"], | |
| "missing_required": ["fields"], | |
| "info_progress": 0.0, | |
| "status": "open | resolved", | |
| "step_count": 0, | |
| "remaining_steps": 10, | |
| "difficulty": "easy | medium | hard" | |
| } | |
| ``` | |
| --- | |
| ### 3. Action Space | |
| | Action | Description | | |
| | -------- | -------------------------- | | |
| | classify | Assign category + priority | | |
| | ask_info | Request missing field | | |
| | resolve | Attempt to close ticket | | |
| Example: | |
| ```json | |
| { | |
| "type": "ask_info", | |
| "field": "order_id" | |
| } | |
| ``` | |
| --- | |
| ## ๐ฒ Difficulty & Stochastic Control | |
| The environment dynamically adjusts complexity: | |
| | Difficulty | Max Steps | Noise | Missing Info | | |
| | ---------- | --------- | -------- | ------------ | | |
| | Easy | Low | None | Minimal | | |
| | Medium | Medium | Moderate | Partial | | |
| | Hard | High | High | Significant | | |
| ### Stochastic Elements | |
| * **Noise Injection** | |
| Adds irrelevant or emotional phrases | |
| * **Information Masking** | |
| Required fields may be hidden | |
| * **Ambiguity** | |
| Messages may not clearly indicate category | |
| --- | |
| ## ๐งพ Dataset (Production-Style Tickets) | |
| Each ticket includes: | |
| ```python | |
| { | |
| "ticket_id": "...", | |
| "variants": [...], # multiple phrasings | |
| "noise": [...], # real-world clutter | |
| "ground_truth": { | |
| "category": "...", | |
| "priority": "...", | |
| "required_info": [...], | |
| "intents": [...] # multi-intent support | |
| } | |
| } | |
| ``` | |
| ### Key Properties | |
| * Multiple linguistic variations | |
| * Realistic phrasing (not templated) | |
| * Multi-intent issues (e.g., billing + technical) | |
| * No explicit hints (agent must infer) | |
| --- | |
| ## ๐ Self-Correction Mechanism | |
| The agent is designed to **adapt within an episode**. | |
| ### What this means: | |
| * Can **re-classify after new information** | |
| * Can **delay resolution under uncertainty** | |
| * Can **recover from suboptimal actions** | |
| ### Example behavior: | |
| ``` | |
| classify โ ask_info โ re-classify โ resolve | |
| ``` | |
| This mimics real-world agent reasoning rather than fixed pipelines. | |
| --- | |
| ## ๐ง Agent Design (`agent_llm.py`) | |
| ### Hybrid Intelligence | |
| | Component | Role | | |
| | --------- | ---------------------- | | |
| | LLM | High-level reasoning | | |
| | Rules | Safety + constraints | | |
| | Fallback | Deterministic recovery | | |
| --- | |
| ### Key Capabilities | |
| * Structured JSON output | |
| * Retry + validation loop | |
| * Fallback policy (guarantees progress) | |
| * Partial autonomy (not over-constrained) | |
| --- | |
| ## ๐งฎ Reward Design | |
| Reward is **dense and shaped**, not binary. | |
| | Behavior | Reward | | |
| | ------------------------ | ------------ | | |
| | Step penalty | -0.05 | | |
| | Correct classification | +0.2 | | |
| | Useful info collection | +0.3 | | |
| | Redundant action | -0.3 | | |
| | Premature resolve (hard) | -1.0 | | |
| | Successful resolve | +0.2 to +1.0 | | |
| --- | |
| ## ๐ Metrics | |
| Tracked per episode: | |
| ```json | |
| { | |
| "success_rate": 0.0, | |
| "avg_steps": 0.0, | |
| "avg_reward": 0.0, | |
| "info_efficiency": 0.0 | |
| } | |
| ``` | |
| ### Additional Behavioral Signals | |
| * Self-correction frequency (re-classification) | |
| * Resolution efficiency | |
| * Failure modes under uncertainty | |
| --- | |
| ## ๐งช Tasks & Graders | |
| Three evaluation tasks: | |
| | Task | Difficulty | Objective | | |
| | ------------------------- | ---------- | -------------------------------------- | | |
| | easy-info-collection | Easy | Basic info gathering | | |
| | medium-complete-info | Medium | Complete + accurate handling | | |
| | hard-efficient-resolution | Hard | Efficient resolution under uncertainty | | |
| ### Grader Properties | |
| * Deterministic | |
| * Score range: **0.0 โ 1.0** | |
| * Multi-factor scoring: | |
| * success | |
| * efficiency | |
| * completeness | |
| --- | |
| ## โถ๏ธ Inference | |
| Run baseline agent: | |
| ```bash | |
| python inference.py | |
| ``` | |
| Outputs: | |
| ``` | |
| [START] task=easy-info-collection ... | |
| [STEP] ... | |
| [END] ... | |
| {"task_id": "...", "score": 0.7} | |
| ``` | |
| --- | |
| ## ๐ณ Deployment (Hugging Face Spaces) | |
| ### Build Docker | |
| ```bash | |
| docker build -t openenv-customer-support-agent . | |
| ``` | |
| ### Run | |
| ```bash | |
| docker run -p 7860:7860 openenv-customer-support-agent | |
| ``` | |
| --- | |
| ## ๐ API Endpoints | |
| | Endpoint | Description | | |
| | -------- | ---------------------- | | |
| | `/reset` | Initialize environment | | |
| | `/step` | Execute action | | |
| --- | |
| ## โ๏ธ Environment Variables | |
| Required: | |
| ``` | |
| API_BASE_URL | |
| MODEL_NAME | |
| HF_TOKEN | |
| ``` | |
| --- | |
| ## โ OpenEnv Compliance | |
| * Typed observation/action models | |
| * step/reset/state implemented | |
| * 3+ tasks with graders | |
| * Deterministic scoring | |
| * Dockerized deployment | |
| * HF Space compatible | |
| --- | |
| ## ๐ Key Innovations | |
| * Real-world task simulation (not toy) | |
| * Stochastic difficulty scaling | |
| * Multi-intent ticket modeling | |
| * Self-correcting agent behavior | |
| * Hybrid LLM + rule-based architecture | |
| * Dense reward shaping | |
| --- | |
| ## ๐ฎ Future Improvements | |
| * Multi-stage resolution pipelines | |
| * Conversation memory (history utilization) | |
| * Active uncertainty estimation | |
| * Adaptive task generation | |
| * Multi-agent coordination | |
| --- | |
| ## ๐ง Big Picture | |
| This environment models: | |
| > **Decision-making under uncertainty with partial information** | |
| It is suitable for: | |
| * RL agent training | |
| * LLM agent evaluation | |
| * benchmarking reasoning systems | |
| --- | |
| ## ๐ค Author | |
| Built as part of an advanced OpenEnv submission focused on real-world agent intelligence and evaluation. | |
| --- | |