Spaces:

prashantmatlani
/

csa01

Sleeping

App Files Files Community

csa01 / README.md

prashantmatlani

updated README.md, .yaml

f619d4c 23 days ago

preview code

raw

history blame contribute delete

9.58 kB

metadata

title: Customer Support OpenEnv Environment
emoji: 🤖
colorFrom: blue
colorTo: green
sdk: docker
tags:
  - openenv
  - reinforcement-learning
  - llm
  - customer-support

🤖 Customer Support Agent — OpenEnv Environment

🧠 Overview

This project implements a real-world customer support simulation environment built using the OpenEnv specification.

It is designed to evaluate and train intelligent agents capable of:

Understanding noisy and ambiguous user queries
Classifying issues correctly
Gathering missing information efficiently
Resolving tickets under uncertainty

Unlike toy environments, this system models real operational complexity found in production customer support workflows.

🎯 Objective

Build and evaluate an agent that can:

Classify customer issues (billing / technical / delivery)
Collect required information dynamically
Resolve efficiently under constraints
Adapt behavior mid-episode (self-correction)

🏗️ System Architecture

+----------------------+ | Customer Ticket | | (noisy, ambiguous) | +----------+-----------+ | v +----------------------+

Environment (env.py)
- State
- Reward
- Stochasticity
+----------+-----------+

       |
       v

+----------------------+

Observation Space
message
known_info
required
+----------+-----------+

       |
       v

+----------------------+

Agent (LLM + Rule)
- Reasoning (LLM)
- Constraints
- Fallback
+----------+-----------+

       |
       v

+----------------------+

Action
classify
ask_info
resolve
+----------+-----------+

       |
       v

+----------------------+

Environment Step
reward
next_state
+----------------------+

Interaction Loop

RESET → OBSERVE → ACT → STEP → REPEAT

Detailed Flow:

[RESET] ↓ [Observation] ↓ [Agent Decision] ↓ [Action] ↓ [Environment Step] ↓ [Reward + Next State] ↓ [Done?] ── No ──> Loop │ Yes ↓ [Episode End]

Self-Correction Loop

Initial Flow: classify → ask_info → resolve

With Self-Correction:

classify ↓ ask_info ↓ [New Information Arrives] ↓ re-evaluate decision ↓ re-classify (if needed) ↓ ask remaining info ↓ resolve

Agent Decision Logic

IF not classified: → classify

ELIF missing required fields: → ask_info

ELIF uncertain: → re-classify

ELSE: → resolve

Stochastic Behavior

Customer Message = base_variant

noise injection
ambiguity

Required Info = full_schema

randomly masked fields

Difficulty Controls: EASY → low noise, clear signals MEDIUM → moderate noise HARD → high ambiguity + missing info

Reward Flow

Action → Immediate Reward → Final Outcome

Examples:

ask_info (useful) → +0.3 repeat ask → -0.3 step penalty → -0.05 correct classify → +0.2 premature resolve → -1.0 (hard) successful resolve → +0.2 to +1.0

Example Episode

Step 1: classify → reward -0.05 Step 2: ask_info → reward +0.20 Step 3: re-classify → reward -0.05 Step 4: resolve → reward +0.45

Outcome: ✔ success ✔ self-correction observed ✔ efficient resolution

1. Environment (`env.py`)

A stateful, stochastic simulation of customer support operations.

Key Features

Multi-step interaction loop (step, reset, state)
Partial observability (missing information)
Stochastic noise injection
Difficulty-aware configuration
Multi-intent ticket handling
Reward shaping with penalties for poor decisions

2. Observation Space

{
  "ticket_id": "string",
  "customer_message": "string",
  "known_info": {},
  "required": ["fields"],
  "missing_required": ["fields"],
  "info_progress": 0.0,
  "status": "open | resolved",
  "step_count": 0,
  "remaining_steps": 10,
  "difficulty": "easy | medium | hard"
}

3. Action Space

Action	Description
classify	Assign category + priority
ask_info	Request missing field
resolve	Attempt to close ticket

Example:

{
  "type": "ask_info",
  "field": "order_id"
}

🎲 Difficulty & Stochastic Control

The environment dynamically adjusts complexity:

Difficulty	Max Steps	Noise	Missing Info
Easy	Low	None	Minimal
Medium	Medium	Moderate	Partial
Hard	High	High	Significant

Stochastic Elements

Noise Injection Adds irrelevant or emotional phrases
Information Masking Required fields may be hidden
Ambiguity Messages may not clearly indicate category

🧾 Dataset (Production-Style Tickets)

Each ticket includes:

{
  "ticket_id": "...",
  "variants": [...],        # multiple phrasings
  "noise": [...],           # real-world clutter
  "ground_truth": {
      "category": "...",
      "priority": "...",
      "required_info": [...],
      "intents": [...]      # multi-intent support
  }
}

Key Properties

Multiple linguistic variations
Realistic phrasing (not templated)
Multi-intent issues (e.g., billing + technical)
No explicit hints (agent must infer)

🔁 Self-Correction Mechanism

The agent is designed to adapt within an episode.

What this means:

Can re-classify after new information
Can delay resolution under uncertainty
Can recover from suboptimal actions

Example behavior:

classify → ask_info → re-classify → resolve

This mimics real-world agent reasoning rather than fixed pipelines.

🧠 Agent Design (`agent_llm.py`)

Hybrid Intelligence

Component	Role
LLM	High-level reasoning
Rules	Safety + constraints
Fallback	Deterministic recovery

Key Capabilities

Structured JSON output
Retry + validation loop
Fallback policy (guarantees progress)
Partial autonomy (not over-constrained)

🧮 Reward Design

Reward is dense and shaped, not binary.

Behavior	Reward
Step penalty	-0.05
Correct classification	+0.2
Useful info collection	+0.3
Redundant action	-0.3
Premature resolve (hard)	-1.0
Successful resolve	+0.2 to +1.0

📊 Metrics

Tracked per episode:

{
  "success_rate": 0.0,
  "avg_steps": 0.0,
  "avg_reward": 0.0,
  "info_efficiency": 0.0
}

Additional Behavioral Signals

Self-correction frequency (re-classification)
Resolution efficiency
Failure modes under uncertainty

🧪 Tasks & Graders

Three evaluation tasks:

Task	Difficulty	Objective
easy-info-collection	Easy	Basic info gathering
medium-complete-info	Medium	Complete + accurate handling
hard-efficient-resolution	Hard	Efficient resolution under uncertainty

Grader Properties

Deterministic
Score range: 0.0 – 1.0
Multi-factor scoring:
- success
- efficiency
- completeness

▶️ Inference

Run baseline agent:

python inference.py

Outputs:

[START] task=easy-info-collection ...
[STEP] ...
[END] ...
{"task_id": "...", "score": 0.7}

🐳 Deployment (Hugging Face Spaces)

Build Docker

docker build -t openenv-customer-support-agent .

Run

docker run -p 7860:7860 openenv-customer-support-agent

🌐 API Endpoints

Endpoint	Description
`/reset`	Initialize environment
`/step`	Execute action

⚙️ Environment Variables

Required:

API_BASE_URL
MODEL_NAME
HF_TOKEN

✅ OpenEnv Compliance

Typed observation/action models
step/reset/state implemented
3+ tasks with graders
Deterministic scoring
Dockerized deployment
HF Space compatible

🚀 Key Innovations

Real-world task simulation (not toy)
Stochastic difficulty scaling
Multi-intent ticket modeling
Self-correcting agent behavior
Hybrid LLM + rule-based architecture
Dense reward shaping

🔮 Future Improvements

Multi-stage resolution pipelines
Conversation memory (history utilization)
Active uncertainty estimation
Adaptive task generation
Multi-agent coordination

🧠 Big Picture

This environment models:

Decision-making under uncertainty with partial information

It is suitable for:

RL agent training
LLM agent evaluation
benchmarking reasoning systems

👤 Author

Built as part of an advanced OpenEnv submission focused on real-world agent intelligence and evaluation.

🤖 Customer Support Agent — OpenEnv Environment

🧠 Overview

🎯 Objective

🏗️ System Architecture

Interaction Loop

Self-Correction Loop

Agent Decision Logic

Stochastic Behavior

Reward Flow

Example Episode

1. Environment (env.py)

Key Features

2. Observation Space

3. Action Space

🎲 Difficulty & Stochastic Control

Stochastic Elements

🧾 Dataset (Production-Style Tickets)

Key Properties

🔁 Self-Correction Mechanism

What this means:

Example behavior:

🧠 Agent Design (agent_llm.py)

Hybrid Intelligence

Key Capabilities

🧮 Reward Design

📊 Metrics

Additional Behavioral Signals

🧪 Tasks & Graders

Grader Properties

▶️ Inference

🐳 Deployment (Hugging Face Spaces)

Build Docker

Run

🌐 API Endpoints

⚙️ Environment Variables

✅ OpenEnv Compliance

🚀 Key Innovations

🔮 Future Improvements

🧠 Big Picture

👤 Author

1. Environment (`env.py`)

🧠 Agent Design (`agent_llm.py`)