Spaces:

prashantmatlani
/

csa01

Sleeping

App Files Files Community

csa01 / README.md

prashantmatlani

updated README.md, .yaml

f619d4c 24 days ago

preview code

raw

history blame contribute delete

9.58 kB

	---
	title: Customer Support OpenEnv Environment
	emoji: 🤖
	colorFrom: blue
	colorTo: green
	sdk: docker
	tags:
	- openenv
	- reinforcement-learning
	- llm
	- customer-support
	---

	# 🤖 Customer Support Agent — OpenEnv Environment

	## 🧠 Overview

	This project implements a real-world customer support simulation environment built using the OpenEnv specification.

	It is designed to evaluate and train intelligent agents capable of:

	* Understanding noisy and ambiguous user queries
	* Classifying issues correctly
	* Gathering missing information efficiently
	* Resolving tickets under uncertainty

	Unlike toy environments, this system models real operational complexity found in production customer support workflows.

	---

	## 🎯 Objective

	Build and evaluate an agent that can:

	1. Classify customer issues (billing / technical / delivery)
	2. Collect required information dynamically
	3. Resolve efficiently under constraints
	4. Adapt behavior mid-episode (self-correction)

	---

	## 🏗️ System Architecture

	+----------------------+
	\| Customer Ticket \|
	\| (noisy, ambiguous) \|
	+----------+-----------+
	\|
	v
	+----------------------+
	\| Environment (env.py)\|
	\|----------------------\|
	\| - State \|
	\| - Reward \|
	\| - Stochasticity \|
	+----------+-----------+
	\|
	v
	+----------------------+
	\| Observation Space \|
	\|----------------------\|
	\| message \|
	\| known_info \|
	\| required \|
	+----------+-----------+
	\|
	v
	+----------------------+
	\| Agent (LLM + Rule) \|
	\|----------------------\|
	\| - Reasoning (LLM) \|
	\| - Constraints \|
	\| - Fallback \|
	+----------+-----------+
	\|
	v
	+----------------------+
	\| Action \|
	\|----------------------\|
	\| classify \|
	\| ask_info \|
	\| resolve \|
	+----------+-----------+
	\|
	v
	+----------------------+
	\| Environment Step \|
	\|----------------------\|
	\| reward \|
	\| next_state \|
	+----------------------+


	## Interaction Loop

	RESET → OBSERVE → ACT → STEP → REPEAT

	Detailed Flow:

	[RESET]
	↓
	[Observation]
	↓
	[Agent Decision]
	↓
	[Action]
	↓
	[Environment Step]
	↓
	[Reward + Next State]
	↓
	[Done?] ── No ──> Loop
	│
	Yes
	↓
	[Episode End]


	## Self-Correction Loop

	Initial Flow:
	classify → ask_info → resolve

	With Self-Correction:

	classify
	↓
	ask_info
	↓
	[New Information Arrives]
	↓
	re-evaluate decision
	↓
	re-classify (if needed)
	↓
	ask remaining info
	↓
	resolve

	## Agent Decision Logic

	IF not classified:
	→ classify

	ELIF missing required fields:
	→ ask_info

	ELIF uncertain:
	→ re-classify

	ELSE:
	→ resolve

	## Stochastic Behavior

	Customer Message =
	base_variant
	+ noise injection
	+ ambiguity

	Required Info =
	full_schema
	- randomly masked fields

	Difficulty Controls:
	EASY → low noise, clear signals
	MEDIUM → moderate noise
	HARD → high ambiguity + missing info

	## Reward Flow

	Action → Immediate Reward → Final Outcome

	Examples:

	ask_info (useful) → +0.3
	repeat ask → -0.3
	step penalty → -0.05
	correct classify → +0.2
	premature resolve → -1.0 (hard)
	successful resolve → +0.2 to +1.0

	## Example Episode

	Step 1: classify → reward -0.05
	Step 2: ask_info → reward +0.20
	Step 3: re-classify → reward -0.05
	Step 4: resolve → reward +0.45

	Outcome:
	✔ success
	✔ self-correction observed
	✔ efficient resolution

	### 1. Environment (`env.py`)

	A stateful, stochastic simulation of customer support operations.

	#### Key Features

	* Multi-step interaction loop (`step`, `reset`, `state`)
	* Partial observability (missing information)
	* Stochastic noise injection
	* Difficulty-aware configuration
	* Multi-intent ticket handling
	* Reward shaping with penalties for poor decisions

	---

	### 2. Observation Space

	```json
	{
	"ticket_id": "string",
	"customer_message": "string",
	"known_info": {},
	"required": ["fields"],
	"missing_required": ["fields"],
	"info_progress": 0.0,
	"status": "open \| resolved",
	"step_count": 0,
	"remaining_steps": 10,
	"difficulty": "easy \| medium \| hard"
	}
	```

	---

	### 3. Action Space

	\| Action \| Description \|
	\| -------- \| -------------------------- \|
	\| classify \| Assign category + priority \|
	\| ask_info \| Request missing field \|
	\| resolve \| Attempt to close ticket \|

	Example:

	```json
	{
	"type": "ask_info",
	"field": "order_id"
	}
	```

	---

	## 🎲 Difficulty & Stochastic Control

	The environment dynamically adjusts complexity:

	\| Difficulty \| Max Steps \| Noise \| Missing Info \|
	\| ---------- \| --------- \| -------- \| ------------ \|
	\| Easy \| Low \| None \| Minimal \|
	\| Medium \| Medium \| Moderate \| Partial \|
	\| Hard \| High \| High \| Significant \|

	### Stochastic Elements

	* Noise Injection
	Adds irrelevant or emotional phrases

	* Information Masking
	Required fields may be hidden

	* Ambiguity
	Messages may not clearly indicate category

	---

	## 🧾 Dataset (Production-Style Tickets)

	Each ticket includes:

	```python
	{
	"ticket_id": "...",
	"variants": [...], # multiple phrasings
	"noise": [...], # real-world clutter
	"ground_truth": {
	"category": "...",
	"priority": "...",
	"required_info": [...],
	"intents": [...] # multi-intent support
	}
	}
	```

	### Key Properties

	* Multiple linguistic variations
	* Realistic phrasing (not templated)
	* Multi-intent issues (e.g., billing + technical)
	* No explicit hints (agent must infer)

	---

	## 🔁 Self-Correction Mechanism

	The agent is designed to adapt within an episode.

	### What this means:

	* Can re-classify after new information
	* Can delay resolution under uncertainty
	* Can recover from suboptimal actions

	### Example behavior:

	```
	classify → ask_info → re-classify → resolve
	```

	This mimics real-world agent reasoning rather than fixed pipelines.

	---

	## 🧠 Agent Design (`agent_llm.py`)

	### Hybrid Intelligence

	\| Component \| Role \|
	\| --------- \| ---------------------- \|
	\| LLM \| High-level reasoning \|
	\| Rules \| Safety + constraints \|
	\| Fallback \| Deterministic recovery \|

	---

	### Key Capabilities

	* Structured JSON output
	* Retry + validation loop
	* Fallback policy (guarantees progress)
	* Partial autonomy (not over-constrained)

	---

	## 🧮 Reward Design

	Reward is dense and shaped, not binary.

	\| Behavior \| Reward \|
	\| ------------------------ \| ------------ \|
	\| Step penalty \| -0.05 \|
	\| Correct classification \| +0.2 \|
	\| Useful info collection \| +0.3 \|
	\| Redundant action \| -0.3 \|
	\| Premature resolve (hard) \| -1.0 \|
	\| Successful resolve \| +0.2 to +1.0 \|

	---

	## 📊 Metrics

	Tracked per episode:

	```json
	{
	"success_rate": 0.0,
	"avg_steps": 0.0,
	"avg_reward": 0.0,
	"info_efficiency": 0.0
	}
	```

	### Additional Behavioral Signals

	* Self-correction frequency (re-classification)
	* Resolution efficiency
	* Failure modes under uncertainty

	---

	## 🧪 Tasks & Graders

	Three evaluation tasks:

	\| Task \| Difficulty \| Objective \|
	\| ------------------------- \| ---------- \| -------------------------------------- \|
	\| easy-info-collection \| Easy \| Basic info gathering \|
	\| medium-complete-info \| Medium \| Complete + accurate handling \|
	\| hard-efficient-resolution \| Hard \| Efficient resolution under uncertainty \|

	### Grader Properties

	* Deterministic
	* Score range: 0.0 – 1.0
	* Multi-factor scoring:

	* success
	* efficiency
	* completeness

	---

	## ▶️ Inference

	Run baseline agent:

	```bash
	python inference.py
	```

	Outputs:

	```
	[START] task=easy-info-collection ...
	[STEP] ...
	[END] ...
	{"task_id": "...", "score": 0.7}
	```

	---

	## 🐳 Deployment (Hugging Face Spaces)

	### Build Docker

	```bash
	docker build -t openenv-customer-support-agent .
	```

	### Run

	```bash
	docker run -p 7860:7860 openenv-customer-support-agent
	```

	---

	## 🌐 API Endpoints

	\| Endpoint \| Description \|
	\| -------- \| ---------------------- \|
	\| `/reset` \| Initialize environment \|
	\| `/step` \| Execute action \|

	---

	## ⚙️ Environment Variables

	Required:

	```
	API_BASE_URL
	MODEL_NAME
	HF_TOKEN
	```

	---

	## ✅ OpenEnv Compliance

	* Typed observation/action models
	* step/reset/state implemented
	* 3+ tasks with graders
	* Deterministic scoring
	* Dockerized deployment
	* HF Space compatible

	---

	## 🚀 Key Innovations

	* Real-world task simulation (not toy)
	* Stochastic difficulty scaling
	* Multi-intent ticket modeling
	* Self-correcting agent behavior
	* Hybrid LLM + rule-based architecture
	* Dense reward shaping

	---

	## 🔮 Future Improvements

	* Multi-stage resolution pipelines
	* Conversation memory (history utilization)
	* Active uncertainty estimation
	* Adaptive task generation
	* Multi-agent coordination

	---

	## 🧠 Big Picture

	This environment models:

	> Decision-making under uncertainty with partial information

	It is suitable for:

	* RL agent training
	* LLM agent evaluation
	* benchmarking reasoning systems

	---

	## 👤 Author

	Built as part of an advanced OpenEnv submission focused on real-world agent intelligence and evaluation.

	---