--- title: Customer Support OpenEnv Environment emoji: ๐Ÿค– colorFrom: blue colorTo: green sdk: docker tags: - openenv - reinforcement-learning - llm - customer-support --- # ๐Ÿค– Customer Support Agent โ€” OpenEnv Environment ## ๐Ÿง  Overview This project implements a **real-world customer support simulation environment** built using the OpenEnv specification. It is designed to evaluate and train intelligent agents capable of: * Understanding noisy and ambiguous user queries * Classifying issues correctly * Gathering missing information efficiently * Resolving tickets under uncertainty Unlike toy environments, this system models **real operational complexity** found in production customer support workflows. --- ## ๐ŸŽฏ Objective Build and evaluate an agent that can: 1. **Classify** customer issues (billing / technical / delivery) 2. **Collect required information** dynamically 3. **Resolve efficiently** under constraints 4. **Adapt behavior mid-episode** (self-correction) --- ## ๐Ÿ—๏ธ System Architecture +----------------------+ | Customer Ticket | | (noisy, ambiguous) | +----------+-----------+ | v +----------------------+ | Environment (env.py)| |----------------------| | - State | | - Reward | | - Stochasticity | +----------+-----------+ | v +----------------------+ | Observation Space | |----------------------| | message | | known_info | | required | +----------+-----------+ | v +----------------------+ | Agent (LLM + Rule) | |----------------------| | - Reasoning (LLM) | | - Constraints | | - Fallback | +----------+-----------+ | v +----------------------+ | Action | |----------------------| | classify | | ask_info | | resolve | +----------+-----------+ | v +----------------------+ | Environment Step | |----------------------| | reward | | next_state | +----------------------+ ## Interaction Loop RESET โ†’ OBSERVE โ†’ ACT โ†’ STEP โ†’ REPEAT Detailed Flow: [RESET] โ†“ [Observation] โ†“ [Agent Decision] โ†“ [Action] โ†“ [Environment Step] โ†“ [Reward + Next State] โ†“ [Done?] โ”€โ”€ No โ”€โ”€> Loop โ”‚ Yes โ†“ [Episode End] ## Self-Correction Loop Initial Flow: classify โ†’ ask_info โ†’ resolve With Self-Correction: classify โ†“ ask_info โ†“ [New Information Arrives] โ†“ re-evaluate decision โ†“ re-classify (if needed) โ†“ ask remaining info โ†“ resolve ## Agent Decision Logic IF not classified: โ†’ classify ELIF missing required fields: โ†’ ask_info ELIF uncertain: โ†’ re-classify ELSE: โ†’ resolve ## Stochastic Behavior Customer Message = base_variant + noise injection + ambiguity Required Info = full_schema - randomly masked fields Difficulty Controls: EASY โ†’ low noise, clear signals MEDIUM โ†’ moderate noise HARD โ†’ high ambiguity + missing info ## Reward Flow Action โ†’ Immediate Reward โ†’ Final Outcome Examples: ask_info (useful) โ†’ +0.3 repeat ask โ†’ -0.3 step penalty โ†’ -0.05 correct classify โ†’ +0.2 premature resolve โ†’ -1.0 (hard) successful resolve โ†’ +0.2 to +1.0 ## Example Episode Step 1: classify โ†’ reward -0.05 Step 2: ask_info โ†’ reward +0.20 Step 3: re-classify โ†’ reward -0.05 Step 4: resolve โ†’ reward +0.45 Outcome: โœ” success โœ” self-correction observed โœ” efficient resolution ### 1. Environment (`env.py`) A **stateful, stochastic simulation** of customer support operations. #### Key Features * Multi-step interaction loop (`step`, `reset`, `state`) * Partial observability (missing information) * Stochastic noise injection * Difficulty-aware configuration * Multi-intent ticket handling * Reward shaping with penalties for poor decisions --- ### 2. Observation Space ```json { "ticket_id": "string", "customer_message": "string", "known_info": {}, "required": ["fields"], "missing_required": ["fields"], "info_progress": 0.0, "status": "open | resolved", "step_count": 0, "remaining_steps": 10, "difficulty": "easy | medium | hard" } ``` --- ### 3. Action Space | Action | Description | | -------- | -------------------------- | | classify | Assign category + priority | | ask_info | Request missing field | | resolve | Attempt to close ticket | Example: ```json { "type": "ask_info", "field": "order_id" } ``` --- ## ๐ŸŽฒ Difficulty & Stochastic Control The environment dynamically adjusts complexity: | Difficulty | Max Steps | Noise | Missing Info | | ---------- | --------- | -------- | ------------ | | Easy | Low | None | Minimal | | Medium | Medium | Moderate | Partial | | Hard | High | High | Significant | ### Stochastic Elements * **Noise Injection** Adds irrelevant or emotional phrases * **Information Masking** Required fields may be hidden * **Ambiguity** Messages may not clearly indicate category --- ## ๐Ÿงพ Dataset (Production-Style Tickets) Each ticket includes: ```python { "ticket_id": "...", "variants": [...], # multiple phrasings "noise": [...], # real-world clutter "ground_truth": { "category": "...", "priority": "...", "required_info": [...], "intents": [...] # multi-intent support } } ``` ### Key Properties * Multiple linguistic variations * Realistic phrasing (not templated) * Multi-intent issues (e.g., billing + technical) * No explicit hints (agent must infer) --- ## ๐Ÿ” Self-Correction Mechanism The agent is designed to **adapt within an episode**. ### What this means: * Can **re-classify after new information** * Can **delay resolution under uncertainty** * Can **recover from suboptimal actions** ### Example behavior: ``` classify โ†’ ask_info โ†’ re-classify โ†’ resolve ``` This mimics real-world agent reasoning rather than fixed pipelines. --- ## ๐Ÿง  Agent Design (`agent_llm.py`) ### Hybrid Intelligence | Component | Role | | --------- | ---------------------- | | LLM | High-level reasoning | | Rules | Safety + constraints | | Fallback | Deterministic recovery | --- ### Key Capabilities * Structured JSON output * Retry + validation loop * Fallback policy (guarantees progress) * Partial autonomy (not over-constrained) --- ## ๐Ÿงฎ Reward Design Reward is **dense and shaped**, not binary. | Behavior | Reward | | ------------------------ | ------------ | | Step penalty | -0.05 | | Correct classification | +0.2 | | Useful info collection | +0.3 | | Redundant action | -0.3 | | Premature resolve (hard) | -1.0 | | Successful resolve | +0.2 to +1.0 | --- ## ๐Ÿ“Š Metrics Tracked per episode: ```json { "success_rate": 0.0, "avg_steps": 0.0, "avg_reward": 0.0, "info_efficiency": 0.0 } ``` ### Additional Behavioral Signals * Self-correction frequency (re-classification) * Resolution efficiency * Failure modes under uncertainty --- ## ๐Ÿงช Tasks & Graders Three evaluation tasks: | Task | Difficulty | Objective | | ------------------------- | ---------- | -------------------------------------- | | easy-info-collection | Easy | Basic info gathering | | medium-complete-info | Medium | Complete + accurate handling | | hard-efficient-resolution | Hard | Efficient resolution under uncertainty | ### Grader Properties * Deterministic * Score range: **0.0 โ€“ 1.0** * Multi-factor scoring: * success * efficiency * completeness --- ## โ–ถ๏ธ Inference Run baseline agent: ```bash python inference.py ``` Outputs: ``` [START] task=easy-info-collection ... [STEP] ... [END] ... {"task_id": "...", "score": 0.7} ``` --- ## ๐Ÿณ Deployment (Hugging Face Spaces) ### Build Docker ```bash docker build -t openenv-customer-support-agent . ``` ### Run ```bash docker run -p 7860:7860 openenv-customer-support-agent ``` --- ## ๐ŸŒ API Endpoints | Endpoint | Description | | -------- | ---------------------- | | `/reset` | Initialize environment | | `/step` | Execute action | --- ## โš™๏ธ Environment Variables Required: ``` API_BASE_URL MODEL_NAME HF_TOKEN ``` --- ## โœ… OpenEnv Compliance * Typed observation/action models * step/reset/state implemented * 3+ tasks with graders * Deterministic scoring * Dockerized deployment * HF Space compatible --- ## ๐Ÿš€ Key Innovations * Real-world task simulation (not toy) * Stochastic difficulty scaling * Multi-intent ticket modeling * Self-correcting agent behavior * Hybrid LLM + rule-based architecture * Dense reward shaping --- ## ๐Ÿ”ฎ Future Improvements * Multi-stage resolution pipelines * Conversation memory (history utilization) * Active uncertainty estimation * Adaptive task generation * Multi-agent coordination --- ## ๐Ÿง  Big Picture This environment models: > **Decision-making under uncertainty with partial information** It is suitable for: * RL agent training * LLM agent evaluation * benchmarking reasoning systems --- ## ๐Ÿ‘ค Author Built as part of an advanced OpenEnv submission focused on real-world agent intelligence and evaluation. ---