Spaces:
Sleeping
title: Sieve
sdk: docker
pinned: false
Sieve β Customer Support RL Environment
Sieve is a reinforcement learning environment that simulates a real-world customer support inbox. An AI agent interacts with it through a standard reset() / step() / state() HTTP API, receiving emails, taking actions, and earning rewards based on how well it handles each situation.
How It Works
The agent calls /reset to start an episode, then loops β reading the current email from the Observation, posting an Action to /step, and receiving a Reward and next Observation β until done=true. Each step reward reflects immediate quality. A -0.005 step penalty discourages unnecessary actions. The final grader score from /grader is a holistic metric computed over the full episode.
Project Structure
.
βββ models.py # Shared Pydantic models (Action, Observation, Reward, etc.)
βββ inference.py # Baseline agent script using OpenAI client
βββ logger.py # Structured [START]/[STEP]/[END] stdout logger
βββ openenv.yaml # OpenEnv environment metadata
βββ pyproject.toml # Project config and dependencies
βββ Dockerfile # Container definition
βββ .env.example # Example environment variables (copy to .env)
βββ server/
βββ app.py # FastAPI application and API endpoints
βββ environment.py # Core environment logic (step, reset, reward, grader)
βββ data.py # Email datasets for all three tasks
βββ config.py # Action schema definition
Tasks
Task 1 β Email Classification (Easy)
The agent receives one email at a time and must classify it using the classify action.
Available action: classify only
Step Rewards
- Correct category:
+0.15 - Wrong category:
-0.05 - Correct urgency:
+0.05 - Wrong urgency:
-0.02 - Wrong action type:
-0.05 - Step penalty:
-0.005
Final Grader Score
- Category accuracy:
70%weight - Urgency accuracy:
30%weight
Task 2 β Response Drafting (Medium)
The agent reads a customer email and drafts a professional response using the respond action.
Available action: respond only
Step Rewards
- Response >= 50 characters:
+0.05 - Response < 50 characters:
-0.10 - Keyword coverage: up to
+0.25(scaled bymatched / min_required) - Negative/unprofessional tone (VADER neg > 0.4):
-0.10 - Wrong action type:
-0.05 - Step penalty:
-0.005
Final Grader Score
- Keyword coverage weighted at
0.80 - Length bonus up to
0.20(scaled bylength / 200, requires length > 50) - Averaged across all emails in the task
Task 3 β Full Support Session (Hard)
The agent manages a queue of 15 mixed emails. It must choose which email to handle, classify it, and take the right action β all in the correct priority order.
Available actions: respond, escalate, archive, skip
Priority rules
- VIP customers (
sender_tier=vip) must be handled before standard customers - High urgency emails take precedence over medium and low
- Security breaches and VIP incidents β
escalate - Spam and feature requests β
archive - Standard billing and technical issues β
respond - Use
email_idin the action to select which email to process
Step Rewards
- VIP email handled in first 4 positions:
+0.08 - VIP email delayed (position >= 4):
-0.05 - High urgency email in first 6 positions:
+0.05 - Low urgency email after position 6:
+0.03 - Correct category:
+0.04 - Correct urgency:
+0.02 - Correct action:
+0.06 - Wrong action:
-0.03 - Response text provided and > 50 characters:
+0.02 - Spam not archived:
-0.04 - Step penalty:
-0.005
Final Grader Score
- VIP prioritization: up to
0.20(40% credit if handled late) - High urgency prioritization: up to
0.10(40% credit if handled late) - Category accuracy: up to
0.15 - Urgency accuracy: up to
0.15 - Action accuracy: up to
0.30 - Email coverage: up to
0.10 - Maximum:
1.0
Data Models
Enums
ActionType
classifyβ Classify an email into a category and urgencyrespondβ Draft a response to an emailescalateβ Escalate an email with a reasonarchiveβ Archive an emailskipβ Skip the current email
Category
billingβ Payment, invoices, subscription issuestechnicalβ Bugs, errors, technical failuresgeneralβ General inquiriesspamβ Unsolicited or irrelevant messagesaccountβ Account access, settings, profile issuesfeature_requestβ Requests for new features
Urgency
highβ Requires immediate attentionmediumβ Standard prioritylowβ Can be handled later
Models
id(str) β Unique email identifiersubject(str) β Email subject linebody(str) β Email body contentsender(str) β Sender's email addresssender_tier(str, default:"standard") β Customer tier (standardorvip)received_minutes_ago(int, default:0) β How long ago the email was received
Action
action_type(ActionType) β The action to performcategory(Category, optional) β Email category, used withclassifyurgency(Urgency, optional) β Email urgency, used withclassifyresponse_text(str, optional) β Drafted response, used withrespondescalation_reason(str, optional) β Reason for escalation, used withescalateemail_id(str, optional) β Target email ID, used insupport_sessionto select which email to process
Observation
current_email(Email, optional) β The email currently being processedemail_queue(List[Email], default:[]) β Queue of pending emails, populated in Task 3 onlyprocessed_count(int, default:0) β Number of emails processed so farstep_count(int, default:0) β Current step numbertask_id(str) β Active task identifiertask_description(str) β Human-readable task descriptionavailable_actions(List[str]) β Actions valid for the current statecontext(Dict) β Additional context such asmax_steps,remaining_steps,queue_size
Reward
value(float) β Total reward for the stepcomponents(Dict[str, float], default:{}) β Breakdown of reward sub-componentsreason(str, default:"") β Human-readable explanation of the reward
StepResult
observation(Observation) β Next environment observationreward(Reward) β Reward received for the actiondone(bool) β Whether the episode has endedinfo(Dict) β Additional diagnostic information
Backend API
| Method | Path | Description |
|---|---|---|
POST |
/reset?task_id=<id> |
Reset environment for a task, returns initial Observation |
POST |
/step |
Submit an Action, returns {observation, reward, done, info} |
GET |
/state |
Current environment state |
GET |
/tasks |
List all tasks with action schema |
GET |
/grader |
Current grader score (0.0β1.0) |
Baseline Scores
Baseline agent: gpt-4o-mini via OpenAI API
| Task | Score | Steps | Total Reward |
|---|---|---|---|
| Email Classification | 0.930 | 10 | 1.755 |
| Response Drafting | 0.920 | 6 | 1.650 |
| Support Session | 0.882 | 15 | 1.506 |
Local Development Setup
Prerequisites
- Python 3.11 or 3.12 (matches the Docker image)
- Optional: uv for creating a virtual environment
Steps
1. Create and activate a virtual environment
With uv:
uv venv --python 3.11
source .venv/bin/activate
Or with the standard library:
python3.11 -m venv .venv
source .venv/bin/activate
2. Install dependencies
pip install -r requirements.txt
3. Download NLTK data (one time)
python -c "import nltk; nltk.download('vader_lexicon', quiet=True); nltk.download('punkt_tab', quiet=True)"
4. Environment variables
Copy the example file and edit .env:
cp .env.example .env
| Variable | Required for | Description |
|---|---|---|
API_BASE_URL |
Baseline inference | OpenAI-compatible API base URL (default: Hugging Face router). |
MODEL_NAME |
Baseline inference | Model identifier for that API. |
HF_TOKEN |
Baseline (HF) | Hugging Face token when using the HF router or similar. |
OPENAI_API_KEY |
Baseline (OpenAI) | OpenAI API key when using OpenAIβs API. Inference uses HF_TOKEN if set, otherwise OPENAI_API_KEY. |
ENV_BASE_URL |
Baseline inference | URL of this environment (http://localhost:7860 locally). |
Running only the API server does not require LLM keys.
5. Start the server
uvicorn server.app:app --host 0.0.0.0 --port 7860 --reload
Open http://localhost:7860/docs to confirm the API is up.
Baseline inference
With the server running (step 5) and .env configured with LLM credentials, run:
python inference.py
Structured logs go to stdout ([START], [STEP], [END]); a JSON summary is printed to stderr.
Docker
Build and run the same service the Hugging Face Space uses:
docker build -t sieve .
docker run --rm -p 7860:7860 sieve
Then set ENV_BASE_URL=http://localhost:7860 (or the containerβs URL) for inference.py.