Spaces:

InosLihka
/

rhythm_env

Sleeping

Akhil Soni commited on 30 days ago

Commit

f36d90a

1 Parent(s): e74ff96

Rewrite README for hackathon human review

- Add rich scenario narratives (marketing analyst, PM, senior engineer)
- Document custom task mode with usage example
- Strengthen real-world motivation section
- Add actual measured baseline scores (heuristic + random agent)
- Fix Quick Start URLs to point to actual HF Space
- Add API endpoints table and Swagger UI link
- Expand observation space docs with TaskInfo details

Files changed (1) hide show

README.md +155 -61

README.md CHANGED Viewed

@@ -9,19 +9,22 @@ tags:
   - openenv
 ---
-# RhythmEnv — Daily Planning RL Environment
-A deterministic reinforcement learning environment that simulates daily planning and execution under constraints like time, energy, deadlines, and task importance.
-## Motivation
-Real-world productivity requires balancing competing priorities: urgent vs. important tasks, energy management, meeting interruptions, and deadline pressure. RhythmEnv provides a clean, deterministic simulation of these trade-offs so RL agents can learn prioritization, scheduling, and resource management skills.
 ## Quick Start
 ```bash
 pip install openenv-core
-pip install git+https://huggingface.co/spaces/openenv/rhythm_env
 ```
 ```python
@@ -29,7 +32,7 @@ import asyncio
 from rhythm_env import RhythmEnv, RhythmAction, ActionType
 async def main():
-    async with RhythmEnv(base_url="https://openenv-rhythm-env.hf.space") as env:
         result = await env.reset(task="easy")
         print(f"Energy: {result.observation.energy}")
         print(f"Tasks: {[t.name for t in result.observation.tasks]}")
@@ -46,7 +49,7 @@ asyncio.run(main())
 |--------|-----------|-------------|
 | `START_TASK` | `task_id: int` | Begin working on a new task |
 | `CONTINUE_TASK` | — | Continue working on current task |
-| `SWITCH_TASK` | `task_id: int` | Switch to a different task (energy penalty) |
 | `TAKE_BREAK` | — | Rest to recover energy and reduce stress |
 ## Observation Space
@@ -54,36 +57,118 @@ asyncio.run(main())
 | Field | Type | Description |
 |-------|------|-------------|
 | `timestep` | `int` | Current 30-minute slot (0-19) |
-| `energy` | `float` | Energy level (0-1) |
-| `stress` | `float` | Stress level (0-1) |
-| `current_task_id` | `int?` | Task being worked on, or null |
-| `tasks` | `List[TaskInfo]` | All tasks with id, name, effort, progress, deadline, importance |
-| `meetings` | `List[int]` | Timesteps blocked by meetings |
 | `remaining_steps` | `int` | Steps left in the episode |
-| `reward_breakdown` | `Dict` | Component-wise reward details |
 ## Episode Design
-- **1 episode = 1 workday** (20 steps of 30 minutes each)
-- Agent starts with initial energy and must manage it throughout the day
-- Meetings block specific timesteps (no task progress during meetings)
-- Tasks have deadlines — missing them increases stress and incurs penalties
 ## Environment Dynamics
 **Energy** (0-1):
-- Working: −0.05 per step
-- Break: +0.12 per step
-- Meeting: −0.03 per step
-- Task switch: −0.02 penalty
 **Stress** (0-1):
-- Missed deadline: +0.15
-- Approaching deadline (≤2 steps): +0.03
-- Break: −0.08
-- Task completion: −0.10
-**Task Progress**: `progress_delta = 0.15 × energy` per step when working.
 ## Reward Design
@@ -91,54 +176,44 @@ Multi-component reward per step (clamped to [-1, 1]):
 | Component | Formula | Signal |
 |-----------|---------|--------|
-| Progress | `+delta × importance × 2.0` | Encourages productive work |
 | Completion bonus | `+importance × 1.5` | Rewards finishing tasks |
-| Stress penalty | `−stress × 0.1` | Penalizes high stress |
-| Deadline miss | `−0.3` per miss | Penalizes missed deadlines |
-| Switch penalty | `−0.1` | Discourages excessive switching |
-| Idle penalty | `−0.05` | Penalizes doing nothing |
-| Break spam | `−0.05 × max(0, consecutive−2)` | Diminishing returns on breaks |
-| Mode bonus | `+0.05/0.02` | Hidden alignment bonus |
-## Tasks (3 Scenarios)
-### Task 1 — Easy (Single Priority)
-- **3 tasks**: 1 high-importance (0.9), 2 low (0.3, 0.2)
-- **2 meetings** (steps 3 and 11), energy starts at 0.75
-- **Moderate deadlines** (steps 10-16)
-- **Goal**: Complete the main task efficiently
-### Task 2 — Medium (Deadline Pressure)
-- **4 tasks** with varied importance
-- **2 meetings** (steps 4 and 12)
-- Energy starts at 0.7, **tight deadlines** (steps 8-18)
-- **Goal**: Maximize completion before deadlines
-### Task 3 — Hard (Energy Tradeoff)
-- **5 tasks**: 1 deep work (effort 0.8), 4 small tasks
-- **1 meeting** (step 6), energy starts at 0.4
-- **Goal**: Balance rest, deep work, and small wins
 ## Grader
 End-of-episode score in [0.0, 1.0]:
 ```
-score = 0.45×completion + 0.20×deadline + 0.15×efficiency + 0.10×energy_mgmt + 0.10×stress_mgmt
 ```
 | Component | Calculation |
 |-----------|-------------|
 | Completion | Importance-weighted fraction of tasks completed |
 | Deadline | Fraction of deadlines met |
-| Efficiency | optimal_steps / actual_steps |
-| Energy mgmt | Average energy over episode |
-| Stress mgmt | 1 − average stress |
-**Expected score ranges:**
-- Random agent: ~0.15–0.35
-- Baseline heuristic: ~0.48–0.55
-- Strong agent: ~0.70–0.85
 ## Setup Instructions
@@ -160,16 +235,35 @@ docker run -p 8000:8000 rhythm-env:latest
 ### Running the Baseline
 ```bash
 export API_BASE_URL="https://router.huggingface.co/v1"
 export MODEL_NAME="Qwen/Qwen2.5-72B-Instruct"
 export HF_TOKEN="your-token"
 python inference.py
 ```
 ## Validation
 ```bash
-openenv validate
 ```
 ## License

   - openenv
 ---
+# RhythmEnv — Daily Planning & Scheduling RL Environment
+An OpenEnv environment where AI agents learn to plan and execute a realistic workday under energy, stress, deadline, and meeting constraints.
+## Why Daily Planning?
+Every knowledge worker faces the same problem every morning: *which task should I work on right now?* The answer depends on deadlines, energy levels, task importance, meeting interruptions, and context-switching costs — a complex optimization problem that most people solve with intuition and habit.
+RhythmEnv turns this into a structured RL problem. An agent manages a set of real work tasks (writing reports, fixing bugs, preparing presentations) across a simulated 10-hour workday. It must learn when to push through deep work, when to rest, when to switch tasks, and when to let low-priority items slide — the same tradeoffs a human makes dozens of times per day.
+This is not a toy problem. Enterprise productivity tools, AI assistants, and scheduling systems all need this capability. RhythmEnv provides a deterministic, reproducible benchmark for evaluating how well agents handle real-world prioritization.
 ## Quick Start
 ```bash
 pip install openenv-core
 ```
 ```python
 from rhythm_env import RhythmEnv, RhythmAction, ActionType
 async def main():
+    async with RhythmEnv(base_url="https://InosLihka-rhythm-env.hf.space") as env:
         result = await env.reset(task="easy")
         print(f"Energy: {result.observation.energy}")
         print(f"Tasks: {[t.name for t in result.observation.tasks]}")
 |--------|-----------|-------------|
 | `START_TASK` | `task_id: int` | Begin working on a new task |
 | `CONTINUE_TASK` | — | Continue working on current task |
+| `SWITCH_TASK` | `task_id: int` | Switch to a different task (energy + reward penalty) |
 | `TAKE_BREAK` | — | Rest to recover energy and reduce stress |
 ## Observation Space
 | Field | Type | Description |
 |-------|------|-------------|
 | `timestep` | `int` | Current 30-minute slot (0-19) |
+| `energy` | `float` | Energy level (0-1), depletes with work, recovers with breaks |
+| `stress` | `float` | Stress level (0-1), rises near deadlines, drops with breaks/completions |
+| `current_task_id` | `int \| null` | Task currently being worked on |
+| `tasks` | `List[TaskInfo]` | All tasks with id, name, description, effort, progress, deadline, importance |
+| `meetings` | `List[int]` | Timesteps blocked by meetings (agent cannot work) |
 | `remaining_steps` | `int` | Steps left in the episode |
+| `reward_breakdown` | `Dict` | Component-wise reward details for interpretability |
+Each `TaskInfo` contains:
+- **name**: Human-readable task name (e.g., "Q3 Performance Report")
+- **description**: What the task involves (e.g., "Compile sales data, create visualizations, and write executive summary")
+- **effort**: Total work required (0-1 scale)
+- **progress**: Work completed so far
+- **deadline**: Timestep by which task should be done
+- **importance**: Priority weight (0-1)
+## Tasks (3 Graded Scenarios)
+### Easy — "Marketing Analyst: Quarterly Review Day"
+> You are a marketing analyst preparing for a quarterly review. Your manager needs the Q3 performance report by midday. You also have routine emails and expense filing to handle.
+| Task | Effort | Deadline | Importance |
+|------|--------|----------|------------|
+| Q3 Performance Report | 0.65 | Step 10 | 0.9 |
+| Client Emails | 0.45 | Step 13 | 0.3 |
+| Expense Filing | 0.35 | Step 16 | 0.2 |
+- **Meetings**: Steps 3 and 11
+- **Starting energy**: 0.75
+- **Challenge**: One clear priority — test basic scheduling ability
+### Medium — "Product Manager: Client Pitch Tomorrow"
+> You are a product manager with a client pitch tomorrow. The proposal and presentation deck are top priority, but you also need to review a teammate's design doc and prepare meeting notes for leadership.
+| Task | Effort | Deadline | Importance |
+|------|--------|----------|------------|
+| Client Proposal | 0.40 | Step 8 | 0.7 |
+| Pitch Deck | 0.35 | Step 10 | 0.8 |
+| Design Review | 0.25 | Step 14 | 0.5 |
+| Leadership Notes | 0.20 | Step 18 | 0.4 |
+- **Meetings**: Steps 4 and 12
+- **Starting energy**: 0.70
+- **Challenge**: Two tight deadlines compete for early slots; meetings eat into critical windows
+### Hard — "Senior Engineer: Critical Release Day"
+> You are a senior engineer on a critical release day. The system architecture redesign is the highest priority, but two production bugs are blocking users, docs need updating, and test coverage is behind.
+| Task | Effort | Deadline | Importance |
+|------|--------|----------|------------|
+| Architecture Redesign | 0.80 | Step 16 | 0.9 |
+| Fix: Login Timeout | 0.15 | Step 6 | 0.5 |
+| Fix: CSV Export | 0.15 | Step 10 | 0.4 |
+| API Documentation | 0.20 | Step 14 | 0.3 |
+| Integration Tests | 0.20 | Step 18 | 0.6 |
+- **Meetings**: Step 6
+- **Starting energy**: 0.40 (!)
+- **Challenge**: Total effort is 1.50 but max completable is ~1.0. Must triage. Deep work task needs sustained energy you don't have. Bug fix deadline collides with the meeting. Forces hard tradeoffs with no perfect solution.
+## Custom Task Mode
+Beyond the 3 graded scenarios, RhythmEnv accepts **custom tasks** — plan your actual workday:
+```python
+result = await env.reset(
+    task="custom",
+    tasks=[
+        {"name": "Write blog post", "effort": 0.5, "deadline": 12, "importance": 0.8,
+         "description": "Draft and edit the technical blog post on caching strategies"},
+        {"name": "Review PRs", "effort": 0.2, "deadline": 8, "importance": 0.6,
+         "description": "Review 3 open pull requests from the team"},
+        {"name": "Fix auth bug", "effort": 0.35, "deadline": 10, "importance": 0.9,
+         "description": "Debug and fix the OAuth token refresh issue"},
+    ],
+    meetings=[4, 10],
+    initial_energy=0.7,
+)
+```
+Custom tasks accept 1-10 tasks with configurable effort (0.05-1.0), deadlines (step 1-20), importance (0.1-1.0), meetings, and initial energy. This makes RhythmEnv usable as a real scheduling tool — connect it to your task manager and let the agent optimize your day.
 ## Episode Design
+- **1 episode = 1 workday**: 20 steps of 30 minutes each (9am-7pm)
+- **Deterministic**: Same scenario always produces the same initial state
+- **Meetings block work**: During meeting steps, the agent's action is ignored
+- **Tasks have deadlines**: Missing them increases stress and incurs reward penalties
+- **Energy depletes with work, recovers with breaks**: The agent must pace itself
 ## Environment Dynamics
 **Energy** (0-1):
+| Event | Change |
+|-------|--------|
+| Working on a task | −0.05 per step |
+| Taking a break | +0.12 per step |
+| In a meeting | −0.03 per step |
+| Switching tasks | −0.02 penalty |
 **Stress** (0-1):
+| Event | Change |
+|-------|--------|
+| Missed deadline | +0.15 |
+| Deadline approaching (≤2 steps) | +0.03 |
+| Taking a break | −0.08 |
+| Completing a task | −0.10 |
+**Task Progress**: `progress_delta = 0.15 × current_energy` per step. Lower energy = slower work.
 ## Reward Design
 | Component | Formula | Signal |
 |-----------|---------|--------|
+| Progress | `+delta × importance × 2.0` | Encourages productive work on important tasks |
 | Completion bonus | `+importance × 1.5` | Rewards finishing tasks |
+| Stress penalty | `−stress × 0.1` | Penalizes sustained high stress |
+| Deadline miss | `−0.3 per miss` | Penalizes missing deadlines |
+| Switch penalty | `−0.1` | Discourages excessive context-switching |
+| Idle penalty | `−0.05` | Penalizes wasted time |
+| Break spam | `−0.05 × max(0, consecutive − 2)` | Diminishing returns on consecutive breaks |
+| Mode bonus | `+0.05 (deep work) / +0.02 (execution)` | Hidden bonus for sustained focus |
 ## Grader
 End-of-episode score in [0.0, 1.0]:
 ```
+score = 0.45 × completion + 0.20 × deadline + 0.15 × efficiency + 0.10 × energy_mgmt + 0.10 × stress_mgmt
 ```
 | Component | Calculation |
 |-----------|-------------|
 | Completion | Importance-weighted fraction of tasks completed |
 | Deadline | Fraction of deadlines met |
+| Efficiency | Theoretical optimal steps / actual working steps |
+| Energy mgmt | Average energy maintained over the episode |
+| Stress mgmt | 1 − average stress over the episode |
+## Baseline Scores
+Measured with the included `inference.py` heuristic (no LLM):
+| Scenario | Baseline Heuristic | Random Agent (avg of 10) |
+|----------|-------------------|--------------------------|
+| Easy | **0.533** | 0.319 (range 0.12-0.70) |
+| Medium | **0.514** | 0.371 (range 0.18-0.54) |
+| Hard | **0.486** | 0.323 (range 0.09-0.58) |
+- Random agents score ~0.1-0.4 (degenerate strategies are penalized)
+- Baseline heuristic scores ~0.49-0.53 (reasonable but not optimal)
+- Strong LLM agents should score 0.65+ by learning energy management and deadline-aware triage
 ## Setup Instructions
 ### Running the Baseline
 ```bash
+# Heuristic only (no API key needed):
+python inference.py
+# With LLM:
 export API_BASE_URL="https://router.huggingface.co/v1"
 export MODEL_NAME="Qwen/Qwen2.5-72B-Instruct"
 export HF_TOKEN="your-token"
 python inference.py
 ```
+## API Endpoints
+| Method | Endpoint | Description |
+|--------|----------|-------------|
+| `POST` | `/reset` | Start a new episode (`{"task": "easy\|medium\|hard\|custom"}`) |
+| `POST` | `/step` | Execute an action |
+| `GET` | `/state` | Get current environment state |
+| `GET` | `/health` | Health check |
+| `GET` | `/metadata` | Environment metadata |
+| `GET` | `/schema` | Action/observation JSON schemas |
+| `POST` | `/mcp` | MCP JSON-RPC endpoint |
+Interactive docs: [Swagger UI](https://InosLihka-rhythm-env.hf.space/docs)
 ## Validation
 ```bash
+openenv validate                    # Local structure check
+openenv validate --url https://InosLihka-rhythm-env.hf.space  # Runtime check
 ```
 ## License