sieve / README.md
jampuramprem's picture
Just added a small error message in the inferance and added .env.example for detailing purpose
d0b62f7
---
title: Sieve
sdk: docker
pinned: false
---
# Sieve β€” Customer Support RL Environment
Sieve is a reinforcement learning environment that simulates a real-world customer support inbox. An AI agent interacts with it through a standard `reset() / step() / state()` HTTP API, receiving emails, taking actions, and earning rewards based on how well it handles each situation.
## How It Works
![How It Works](assets/how_it_works_v2.svg)
The agent calls `/reset` to start an episode, then loops β€” reading the current email from the `Observation`, posting an `Action` to `/step`, and receiving a `Reward` and next `Observation` β€” until `done=true`. Each step reward reflects immediate quality. A `-0.005` step penalty discourages unnecessary actions. The final grader score from `/grader` is a holistic metric computed over the full episode.
## Project Structure
```
.
β”œβ”€β”€ models.py # Shared Pydantic models (Action, Observation, Reward, etc.)
β”œβ”€β”€ inference.py # Baseline agent script using OpenAI client
β”œβ”€β”€ logger.py # Structured [START]/[STEP]/[END] stdout logger
β”œβ”€β”€ openenv.yaml # OpenEnv environment metadata
β”œβ”€β”€ pyproject.toml # Project config and dependencies
β”œβ”€β”€ Dockerfile # Container definition
β”œβ”€β”€ .env.example # Example environment variables (copy to .env)
└── server/
β”œβ”€β”€ app.py # FastAPI application and API endpoints
β”œβ”€β”€ environment.py # Core environment logic (step, reset, reward, grader)
β”œβ”€β”€ data.py # Email datasets for all three tasks
└── config.py # Action schema definition
```
## Tasks
### Task 1 β€” Email Classification (Easy)
The agent receives one email at a time and must classify it using the `classify` action.
**Available action:** `classify` only
**Step Rewards**
- Correct category: `+0.15`
- Wrong category: `-0.05`
- Correct urgency: `+0.05`
- Wrong urgency: `-0.02`
- Wrong action type: `-0.05`
- Step penalty: `-0.005`
**Final Grader Score**
- Category accuracy: `70%` weight
- Urgency accuracy: `30%` weight
---
### Task 2 β€” Response Drafting (Medium)
The agent reads a customer email and drafts a professional response using the `respond` action.
**Available action:** `respond` only
**Step Rewards**
- Response >= 50 characters: `+0.05`
- Response < 50 characters: `-0.10`
- Keyword coverage: up to `+0.25` (scaled by `matched / min_required`)
- Negative/unprofessional tone (VADER neg > 0.4): `-0.10`
- Wrong action type: `-0.05`
- Step penalty: `-0.005`
**Final Grader Score**
- Keyword coverage weighted at `0.80`
- Length bonus up to `0.20` (scaled by `length / 200`, requires length > 50)
- Averaged across all emails in the task
---
### Task 3 β€” Full Support Session (Hard)
The agent manages a queue of 15 mixed emails. It must choose which email to handle, classify it, and take the right action β€” all in the correct priority order.
**Available actions:** `respond`, `escalate`, `archive`, `skip`
**Priority rules**
- VIP customers (`sender_tier=vip`) must be handled before standard customers
- High urgency emails take precedence over medium and low
- Security breaches and VIP incidents β†’ `escalate`
- Spam and feature requests β†’ `archive`
- Standard billing and technical issues β†’ `respond`
- Use `email_id` in the action to select which email to process
**Step Rewards**
- VIP email handled in first 4 positions: `+0.08`
- VIP email delayed (position >= 4): `-0.05`
- High urgency email in first 6 positions: `+0.05`
- Low urgency email after position 6: `+0.03`
- Correct category: `+0.04`
- Correct urgency: `+0.02`
- Correct action: `+0.06`
- Wrong action: `-0.03`
- Response text provided and > 50 characters: `+0.02`
- Spam not archived: `-0.04`
- Step penalty: `-0.005`
**Final Grader Score**
- VIP prioritization: up to `0.20` (40% credit if handled late)
- High urgency prioritization: up to `0.10` (40% credit if handled late)
- Category accuracy: up to `0.15`
- Urgency accuracy: up to `0.15`
- Action accuracy: up to `0.30`
- Email coverage: up to `0.10`
- Maximum: `1.0`
---
## Data Models
### Enums
#### ActionType
- `classify` β€” Classify an email into a category and urgency
- `respond` β€” Draft a response to an email
- `escalate` β€” Escalate an email with a reason
- `archive` β€” Archive an email
- `skip` β€” Skip the current email
#### Category
- `billing` β€” Payment, invoices, subscription issues
- `technical` β€” Bugs, errors, technical failures
- `general` β€” General inquiries
- `spam` β€” Unsolicited or irrelevant messages
- `account` β€” Account access, settings, profile issues
- `feature_request` β€” Requests for new features
#### Urgency
- `high` β€” Requires immediate attention
- `medium` β€” Standard priority
- `low` β€” Can be handled later
### Models
#### Email
- `id` (`str`) β€” Unique email identifier
- `subject` (`str`) β€” Email subject line
- `body` (`str`) β€” Email body content
- `sender` (`str`) β€” Sender's email address
- `sender_tier` (`str`, default: `"standard"`) β€” Customer tier (`standard` or `vip`)
- `received_minutes_ago` (`int`, default: `0`) β€” How long ago the email was received
#### Action
- `action_type` (`ActionType`) β€” The action to perform
- `category` (`Category`, optional) β€” Email category, used with `classify`
- `urgency` (`Urgency`, optional) β€” Email urgency, used with `classify`
- `response_text` (`str`, optional) β€” Drafted response, used with `respond`
- `escalation_reason` (`str`, optional) β€” Reason for escalation, used with `escalate`
- `email_id` (`str`, optional) β€” Target email ID, used in `support_session` to select which email to process
#### Observation
- `current_email` (`Email`, optional) β€” The email currently being processed
- `email_queue` (`List[Email]`, default: `[]`) β€” Queue of pending emails, populated in Task 3 only
- `processed_count` (`int`, default: `0`) β€” Number of emails processed so far
- `step_count` (`int`, default: `0`) β€” Current step number
- `task_id` (`str`) β€” Active task identifier
- `task_description` (`str`) β€” Human-readable task description
- `available_actions` (`List[str]`) β€” Actions valid for the current state
- `context` (`Dict`) β€” Additional context such as `max_steps`, `remaining_steps`, `queue_size`
#### Reward
- `value` (`float`) β€” Total reward for the step
- `components` (`Dict[str, float]`, default: `{}`) β€” Breakdown of reward sub-components
- `reason` (`str`, default: `""`) β€” Human-readable explanation of the reward
#### StepResult
- `observation` (`Observation`) β€” Next environment observation
- `reward` (`Reward`) β€” Reward received for the action
- `done` (`bool`) β€” Whether the episode has ended
- `info` (`Dict`) β€” Additional diagnostic information
## Backend API
| Method | Path | Description |
|--------|------|-------------|
| `POST` | `/reset?task_id=<id>` | Reset environment for a task, returns initial Observation |
| `POST` | `/step` | Submit an Action, returns `{observation, reward, done, info}` |
| `GET` | `/state` | Current environment state |
| `GET` | `/tasks` | List all tasks with action schema |
| `GET` | `/grader` | Current grader score (0.0–1.0) |
## Baseline Scores
Baseline agent: `gpt-4o-mini` via OpenAI API
| Task | Score | Steps | Total Reward |
|------|-------|-------|--------------|
| Email Classification | 0.930 | 10 | 1.755 |
| Response Drafting | 0.920 | 6 | 1.650 |
| Support Session | 0.882 | 15 | 1.506 |
## Local Development Setup
### Prerequisites
- Python 3.11 or 3.12 (matches the Docker image)
- Optional: [uv](https://docs.astral.sh/uv/) for creating a virtual environment
### Steps
**1. Create and activate a virtual environment**
With uv:
```bash
uv venv --python 3.11
source .venv/bin/activate
```
Or with the standard library:
```bash
python3.11 -m venv .venv
source .venv/bin/activate
```
**2. Install dependencies**
```bash
pip install -r requirements.txt
```
**3. Download NLTK data (one time)**
```bash
python -c "import nltk; nltk.download('vader_lexicon', quiet=True); nltk.download('punkt_tab', quiet=True)"
```
**4. Environment variables**
Copy the example file and edit `.env`:
```bash
cp .env.example .env
```
| Variable | Required for | Description |
|----------|----------------|-------------|
| `API_BASE_URL` | Baseline inference | OpenAI-compatible API base URL (default: Hugging Face router). |
| `MODEL_NAME` | Baseline inference | Model identifier for that API. |
| `HF_TOKEN` | Baseline (HF) | Hugging Face token when using the HF router or similar. |
| `OPENAI_API_KEY` | Baseline (OpenAI) | OpenAI API key when using OpenAI’s API. Inference uses `HF_TOKEN` if set, otherwise `OPENAI_API_KEY`. |
| `ENV_BASE_URL` | Baseline inference | URL of this environment (`http://localhost:7860` locally). |
Running only the API server does not require LLM keys.
**5. Start the server**
```bash
uvicorn server.app:app --host 0.0.0.0 --port 7860 --reload
```
Open `http://localhost:7860/docs` to confirm the API is up.
### Baseline inference
With the server running (step 5) and `.env` configured with LLM credentials, run:
```bash
python inference.py
```
Structured logs go to stdout (`[START]`, `[STEP]`, `[END]`); a JSON summary is printed to stderr.
### Docker
Build and run the same service the Hugging Face Space uses:
```bash
docker build -t sieve .
docker run --rm -p 7860:7860 sieve
```
Then set `ENV_BASE_URL=http://localhost:7860` (or the container’s URL) for `inference.py`.