csa01 / README.md
prashantmatlani's picture
updated README.md, .yaml
f619d4c
metadata
title: Customer Support OpenEnv Environment
emoji: ๐Ÿค–
colorFrom: blue
colorTo: green
sdk: docker
tags:
  - openenv
  - reinforcement-learning
  - llm
  - customer-support

๐Ÿค– Customer Support Agent โ€” OpenEnv Environment

๐Ÿง  Overview

This project implements a real-world customer support simulation environment built using the OpenEnv specification.

It is designed to evaluate and train intelligent agents capable of:

  • Understanding noisy and ambiguous user queries
  • Classifying issues correctly
  • Gathering missing information efficiently
  • Resolving tickets under uncertainty

Unlike toy environments, this system models real operational complexity found in production customer support workflows.


๐ŸŽฏ Objective

Build and evaluate an agent that can:

  1. Classify customer issues (billing / technical / delivery)
  2. Collect required information dynamically
  3. Resolve efficiently under constraints
  4. Adapt behavior mid-episode (self-correction)

๐Ÿ—๏ธ System Architecture

+----------------------+ | Customer Ticket | | (noisy, ambiguous) | +----------+-----------+ | v +----------------------+

Environment (env.py)
- State
- Reward
- Stochasticity
+----------+-----------+
       |
       v

+----------------------+

Observation Space
message
known_info
required
+----------+-----------+
       |
       v

+----------------------+

Agent (LLM + Rule)
- Reasoning (LLM)
- Constraints
- Fallback
+----------+-----------+
       |
       v

+----------------------+

Action
classify
ask_info
resolve
+----------+-----------+
       |
       v

+----------------------+

Environment Step
reward
next_state
+----------------------+

Interaction Loop

RESET โ†’ OBSERVE โ†’ ACT โ†’ STEP โ†’ REPEAT

Detailed Flow:

[RESET] โ†“ [Observation] โ†“ [Agent Decision] โ†“ [Action] โ†“ [Environment Step] โ†“ [Reward + Next State] โ†“ [Done?] โ”€โ”€ No โ”€โ”€> Loop โ”‚ Yes โ†“ [Episode End]

Self-Correction Loop

Initial Flow: classify โ†’ ask_info โ†’ resolve

With Self-Correction:

classify โ†“ ask_info โ†“ [New Information Arrives] โ†“ re-evaluate decision โ†“ re-classify (if needed) โ†“ ask remaining info โ†“ resolve

Agent Decision Logic

IF not classified: โ†’ classify

ELIF missing required fields: โ†’ ask_info

ELIF uncertain: โ†’ re-classify

ELSE: โ†’ resolve

Stochastic Behavior

Customer Message = base_variant

  • noise injection
  • ambiguity

Required Info = full_schema

  • randomly masked fields

Difficulty Controls: EASY โ†’ low noise, clear signals MEDIUM โ†’ moderate noise HARD โ†’ high ambiguity + missing info

Reward Flow

Action โ†’ Immediate Reward โ†’ Final Outcome

Examples:

ask_info (useful) โ†’ +0.3 repeat ask โ†’ -0.3 step penalty โ†’ -0.05 correct classify โ†’ +0.2 premature resolve โ†’ -1.0 (hard) successful resolve โ†’ +0.2 to +1.0

Example Episode

Step 1: classify โ†’ reward -0.05 Step 2: ask_info โ†’ reward +0.20 Step 3: re-classify โ†’ reward -0.05 Step 4: resolve โ†’ reward +0.45

Outcome: โœ” success โœ” self-correction observed โœ” efficient resolution

1. Environment (env.py)

A stateful, stochastic simulation of customer support operations.

Key Features

  • Multi-step interaction loop (step, reset, state)
  • Partial observability (missing information)
  • Stochastic noise injection
  • Difficulty-aware configuration
  • Multi-intent ticket handling
  • Reward shaping with penalties for poor decisions

2. Observation Space

{
  "ticket_id": "string",
  "customer_message": "string",
  "known_info": {},
  "required": ["fields"],
  "missing_required": ["fields"],
  "info_progress": 0.0,
  "status": "open | resolved",
  "step_count": 0,
  "remaining_steps": 10,
  "difficulty": "easy | medium | hard"
}

3. Action Space

Action Description
classify Assign category + priority
ask_info Request missing field
resolve Attempt to close ticket

Example:

{
  "type": "ask_info",
  "field": "order_id"
}

๐ŸŽฒ Difficulty & Stochastic Control

The environment dynamically adjusts complexity:

Difficulty Max Steps Noise Missing Info
Easy Low None Minimal
Medium Medium Moderate Partial
Hard High High Significant

Stochastic Elements

  • Noise Injection Adds irrelevant or emotional phrases

  • Information Masking Required fields may be hidden

  • Ambiguity Messages may not clearly indicate category


๐Ÿงพ Dataset (Production-Style Tickets)

Each ticket includes:

{
  "ticket_id": "...",
  "variants": [...],        # multiple phrasings
  "noise": [...],           # real-world clutter
  "ground_truth": {
      "category": "...",
      "priority": "...",
      "required_info": [...],
      "intents": [...]      # multi-intent support
  }
}

Key Properties

  • Multiple linguistic variations
  • Realistic phrasing (not templated)
  • Multi-intent issues (e.g., billing + technical)
  • No explicit hints (agent must infer)

๐Ÿ” Self-Correction Mechanism

The agent is designed to adapt within an episode.

What this means:

  • Can re-classify after new information
  • Can delay resolution under uncertainty
  • Can recover from suboptimal actions

Example behavior:

classify โ†’ ask_info โ†’ re-classify โ†’ resolve

This mimics real-world agent reasoning rather than fixed pipelines.


๐Ÿง  Agent Design (agent_llm.py)

Hybrid Intelligence

Component Role
LLM High-level reasoning
Rules Safety + constraints
Fallback Deterministic recovery

Key Capabilities

  • Structured JSON output
  • Retry + validation loop
  • Fallback policy (guarantees progress)
  • Partial autonomy (not over-constrained)

๐Ÿงฎ Reward Design

Reward is dense and shaped, not binary.

Behavior Reward
Step penalty -0.05
Correct classification +0.2
Useful info collection +0.3
Redundant action -0.3
Premature resolve (hard) -1.0
Successful resolve +0.2 to +1.0

๐Ÿ“Š Metrics

Tracked per episode:

{
  "success_rate": 0.0,
  "avg_steps": 0.0,
  "avg_reward": 0.0,
  "info_efficiency": 0.0
}

Additional Behavioral Signals

  • Self-correction frequency (re-classification)
  • Resolution efficiency
  • Failure modes under uncertainty

๐Ÿงช Tasks & Graders

Three evaluation tasks:

Task Difficulty Objective
easy-info-collection Easy Basic info gathering
medium-complete-info Medium Complete + accurate handling
hard-efficient-resolution Hard Efficient resolution under uncertainty

Grader Properties

  • Deterministic

  • Score range: 0.0 โ€“ 1.0

  • Multi-factor scoring:

    • success
    • efficiency
    • completeness

โ–ถ๏ธ Inference

Run baseline agent:

python inference.py

Outputs:

[START] task=easy-info-collection ...
[STEP] ...
[END] ...
{"task_id": "...", "score": 0.7}

๐Ÿณ Deployment (Hugging Face Spaces)

Build Docker

docker build -t openenv-customer-support-agent .

Run

docker run -p 7860:7860 openenv-customer-support-agent

๐ŸŒ API Endpoints

Endpoint Description
/reset Initialize environment
/step Execute action

โš™๏ธ Environment Variables

Required:

API_BASE_URL
MODEL_NAME
HF_TOKEN

โœ… OpenEnv Compliance

  • Typed observation/action models
  • step/reset/state implemented
  • 3+ tasks with graders
  • Deterministic scoring
  • Dockerized deployment
  • HF Space compatible

๐Ÿš€ Key Innovations

  • Real-world task simulation (not toy)
  • Stochastic difficulty scaling
  • Multi-intent ticket modeling
  • Self-correcting agent behavior
  • Hybrid LLM + rule-based architecture
  • Dense reward shaping

๐Ÿ”ฎ Future Improvements

  • Multi-stage resolution pipelines
  • Conversation memory (history utilization)
  • Active uncertainty estimation
  • Adaptive task generation
  • Multi-agent coordination

๐Ÿง  Big Picture

This environment models:

Decision-making under uncertainty with partial information

It is suitable for:

  • RL agent training
  • LLM agent evaluation
  • benchmarking reasoning systems

๐Ÿ‘ค Author

Built as part of an advanced OpenEnv submission focused on real-world agent intelligence and evaluation.