Akhil Soni commited on
Commit
f36d90a
Β·
1 Parent(s): e74ff96

Rewrite README for hackathon human review

Browse files

- Add rich scenario narratives (marketing analyst, PM, senior engineer)
- Document custom task mode with usage example
- Strengthen real-world motivation section
- Add actual measured baseline scores (heuristic + random agent)
- Fix Quick Start URLs to point to actual HF Space
- Add API endpoints table and Swagger UI link
- Expand observation space docs with TaskInfo details

Files changed (1) hide show
  1. README.md +155 -61
README.md CHANGED
@@ -9,19 +9,22 @@ tags:
9
  - openenv
10
  ---
11
 
12
- # RhythmEnv β€” Daily Planning RL Environment
13
 
14
- A deterministic reinforcement learning environment that simulates daily planning and execution under constraints like time, energy, deadlines, and task importance.
15
 
16
- ## Motivation
17
 
18
- Real-world productivity requires balancing competing priorities: urgent vs. important tasks, energy management, meeting interruptions, and deadline pressure. RhythmEnv provides a clean, deterministic simulation of these trade-offs so RL agents can learn prioritization, scheduling, and resource management skills.
 
 
 
 
19
 
20
  ## Quick Start
21
 
22
  ```bash
23
  pip install openenv-core
24
- pip install git+https://huggingface.co/spaces/openenv/rhythm_env
25
  ```
26
 
27
  ```python
@@ -29,7 +32,7 @@ import asyncio
29
  from rhythm_env import RhythmEnv, RhythmAction, ActionType
30
 
31
  async def main():
32
- async with RhythmEnv(base_url="https://openenv-rhythm-env.hf.space") as env:
33
  result = await env.reset(task="easy")
34
  print(f"Energy: {result.observation.energy}")
35
  print(f"Tasks: {[t.name for t in result.observation.tasks]}")
@@ -46,7 +49,7 @@ asyncio.run(main())
46
  |--------|-----------|-------------|
47
  | `START_TASK` | `task_id: int` | Begin working on a new task |
48
  | `CONTINUE_TASK` | β€” | Continue working on current task |
49
- | `SWITCH_TASK` | `task_id: int` | Switch to a different task (energy penalty) |
50
  | `TAKE_BREAK` | β€” | Rest to recover energy and reduce stress |
51
 
52
  ## Observation Space
@@ -54,36 +57,118 @@ asyncio.run(main())
54
  | Field | Type | Description |
55
  |-------|------|-------------|
56
  | `timestep` | `int` | Current 30-minute slot (0-19) |
57
- | `energy` | `float` | Energy level (0-1) |
58
- | `stress` | `float` | Stress level (0-1) |
59
- | `current_task_id` | `int?` | Task being worked on, or null |
60
- | `tasks` | `List[TaskInfo]` | All tasks with id, name, effort, progress, deadline, importance |
61
- | `meetings` | `List[int]` | Timesteps blocked by meetings |
62
  | `remaining_steps` | `int` | Steps left in the episode |
63
- | `reward_breakdown` | `Dict` | Component-wise reward details |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
64
 
65
  ## Episode Design
66
 
67
- - **1 episode = 1 workday** (20 steps of 30 minutes each)
68
- - Agent starts with initial energy and must manage it throughout the day
69
- - Meetings block specific timesteps (no task progress during meetings)
70
- - Tasks have deadlines β€” missing them increases stress and incurs penalties
 
71
 
72
  ## Environment Dynamics
73
 
74
  **Energy** (0-1):
75
- - Working: βˆ’0.05 per step
76
- - Break: +0.12 per step
77
- - Meeting: βˆ’0.03 per step
78
- - Task switch: βˆ’0.02 penalty
 
 
79
 
80
  **Stress** (0-1):
81
- - Missed deadline: +0.15
82
- - Approaching deadline (≀2 steps): +0.03
83
- - Break: βˆ’0.08
84
- - Task completion: βˆ’0.10
 
 
85
 
86
- **Task Progress**: `progress_delta = 0.15 Γ— energy` per step when working.
87
 
88
  ## Reward Design
89
 
@@ -91,54 +176,44 @@ Multi-component reward per step (clamped to [-1, 1]):
91
 
92
  | Component | Formula | Signal |
93
  |-----------|---------|--------|
94
- | Progress | `+delta Γ— importance Γ— 2.0` | Encourages productive work |
95
  | Completion bonus | `+importance Γ— 1.5` | Rewards finishing tasks |
96
- | Stress penalty | `βˆ’stress Γ— 0.1` | Penalizes high stress |
97
- | Deadline miss | `βˆ’0.3` per miss | Penalizes missed deadlines |
98
- | Switch penalty | `βˆ’0.1` | Discourages excessive switching |
99
- | Idle penalty | `βˆ’0.05` | Penalizes doing nothing |
100
- | Break spam | `βˆ’0.05 Γ— max(0, consecutiveβˆ’2)` | Diminishing returns on breaks |
101
- | Mode bonus | `+0.05/0.02` | Hidden alignment bonus |
102
-
103
- ## Tasks (3 Scenarios)
104
-
105
- ### Task 1 β€” Easy (Single Priority)
106
- - **3 tasks**: 1 high-importance (0.9), 2 low (0.3, 0.2)
107
- - **2 meetings** (steps 3 and 11), energy starts at 0.75
108
- - **Moderate deadlines** (steps 10-16)
109
- - **Goal**: Complete the main task efficiently
110
-
111
- ### Task 2 β€” Medium (Deadline Pressure)
112
- - **4 tasks** with varied importance
113
- - **2 meetings** (steps 4 and 12)
114
- - Energy starts at 0.7, **tight deadlines** (steps 8-18)
115
- - **Goal**: Maximize completion before deadlines
116
-
117
- ### Task 3 β€” Hard (Energy Tradeoff)
118
- - **5 tasks**: 1 deep work (effort 0.8), 4 small tasks
119
- - **1 meeting** (step 6), energy starts at 0.4
120
- - **Goal**: Balance rest, deep work, and small wins
121
 
122
  ## Grader
123
 
124
  End-of-episode score in [0.0, 1.0]:
125
 
126
  ```
127
- score = 0.45Γ—completion + 0.20Γ—deadline + 0.15Γ—efficiency + 0.10Γ—energy_mgmt + 0.10Γ—stress_mgmt
128
  ```
129
 
130
  | Component | Calculation |
131
  |-----------|-------------|
132
  | Completion | Importance-weighted fraction of tasks completed |
133
  | Deadline | Fraction of deadlines met |
134
- | Efficiency | optimal_steps / actual_steps |
135
- | Energy mgmt | Average energy over episode |
136
- | Stress mgmt | 1 βˆ’ average stress |
 
 
137
 
138
- **Expected score ranges:**
139
- - Random agent: ~0.15–0.35
140
- - Baseline heuristic: ~0.48–0.55
141
- - Strong agent: ~0.70–0.85
 
 
 
 
 
 
 
142
 
143
  ## Setup Instructions
144
 
@@ -160,16 +235,35 @@ docker run -p 8000:8000 rhythm-env:latest
160
  ### Running the Baseline
161
 
162
  ```bash
 
 
 
 
163
  export API_BASE_URL="https://router.huggingface.co/v1"
164
  export MODEL_NAME="Qwen/Qwen2.5-72B-Instruct"
165
  export HF_TOKEN="your-token"
166
  python inference.py
167
  ```
168
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
169
  ## Validation
170
 
171
  ```bash
172
- openenv validate
 
173
  ```
174
 
175
  ## License
 
9
  - openenv
10
  ---
11
 
12
+ # RhythmEnv β€” Daily Planning & Scheduling RL Environment
13
 
14
+ An OpenEnv environment where AI agents learn to plan and execute a realistic workday under energy, stress, deadline, and meeting constraints.
15
 
16
+ ## Why Daily Planning?
17
 
18
+ Every knowledge worker faces the same problem every morning: *which task should I work on right now?* The answer depends on deadlines, energy levels, task importance, meeting interruptions, and context-switching costs β€” a complex optimization problem that most people solve with intuition and habit.
19
+
20
+ RhythmEnv turns this into a structured RL problem. An agent manages a set of real work tasks (writing reports, fixing bugs, preparing presentations) across a simulated 10-hour workday. It must learn when to push through deep work, when to rest, when to switch tasks, and when to let low-priority items slide β€” the same tradeoffs a human makes dozens of times per day.
21
+
22
+ This is not a toy problem. Enterprise productivity tools, AI assistants, and scheduling systems all need this capability. RhythmEnv provides a deterministic, reproducible benchmark for evaluating how well agents handle real-world prioritization.
23
 
24
  ## Quick Start
25
 
26
  ```bash
27
  pip install openenv-core
 
28
  ```
29
 
30
  ```python
 
32
  from rhythm_env import RhythmEnv, RhythmAction, ActionType
33
 
34
  async def main():
35
+ async with RhythmEnv(base_url="https://InosLihka-rhythm-env.hf.space") as env:
36
  result = await env.reset(task="easy")
37
  print(f"Energy: {result.observation.energy}")
38
  print(f"Tasks: {[t.name for t in result.observation.tasks]}")
 
49
  |--------|-----------|-------------|
50
  | `START_TASK` | `task_id: int` | Begin working on a new task |
51
  | `CONTINUE_TASK` | β€” | Continue working on current task |
52
+ | `SWITCH_TASK` | `task_id: int` | Switch to a different task (energy + reward penalty) |
53
  | `TAKE_BREAK` | β€” | Rest to recover energy and reduce stress |
54
 
55
  ## Observation Space
 
57
  | Field | Type | Description |
58
  |-------|------|-------------|
59
  | `timestep` | `int` | Current 30-minute slot (0-19) |
60
+ | `energy` | `float` | Energy level (0-1), depletes with work, recovers with breaks |
61
+ | `stress` | `float` | Stress level (0-1), rises near deadlines, drops with breaks/completions |
62
+ | `current_task_id` | `int \| null` | Task currently being worked on |
63
+ | `tasks` | `List[TaskInfo]` | All tasks with id, name, description, effort, progress, deadline, importance |
64
+ | `meetings` | `List[int]` | Timesteps blocked by meetings (agent cannot work) |
65
  | `remaining_steps` | `int` | Steps left in the episode |
66
+ | `reward_breakdown` | `Dict` | Component-wise reward details for interpretability |
67
+
68
+ Each `TaskInfo` contains:
69
+ - **name**: Human-readable task name (e.g., "Q3 Performance Report")
70
+ - **description**: What the task involves (e.g., "Compile sales data, create visualizations, and write executive summary")
71
+ - **effort**: Total work required (0-1 scale)
72
+ - **progress**: Work completed so far
73
+ - **deadline**: Timestep by which task should be done
74
+ - **importance**: Priority weight (0-1)
75
+
76
+ ## Tasks (3 Graded Scenarios)
77
+
78
+ ### Easy β€” "Marketing Analyst: Quarterly Review Day"
79
+
80
+ > You are a marketing analyst preparing for a quarterly review. Your manager needs the Q3 performance report by midday. You also have routine emails and expense filing to handle.
81
+
82
+ | Task | Effort | Deadline | Importance |
83
+ |------|--------|----------|------------|
84
+ | Q3 Performance Report | 0.65 | Step 10 | 0.9 |
85
+ | Client Emails | 0.45 | Step 13 | 0.3 |
86
+ | Expense Filing | 0.35 | Step 16 | 0.2 |
87
+
88
+ - **Meetings**: Steps 3 and 11
89
+ - **Starting energy**: 0.75
90
+ - **Challenge**: One clear priority β€” test basic scheduling ability
91
+
92
+ ### Medium β€” "Product Manager: Client Pitch Tomorrow"
93
+
94
+ > You are a product manager with a client pitch tomorrow. The proposal and presentation deck are top priority, but you also need to review a teammate's design doc and prepare meeting notes for leadership.
95
+
96
+ | Task | Effort | Deadline | Importance |
97
+ |------|--------|----------|------------|
98
+ | Client Proposal | 0.40 | Step 8 | 0.7 |
99
+ | Pitch Deck | 0.35 | Step 10 | 0.8 |
100
+ | Design Review | 0.25 | Step 14 | 0.5 |
101
+ | Leadership Notes | 0.20 | Step 18 | 0.4 |
102
+
103
+ - **Meetings**: Steps 4 and 12
104
+ - **Starting energy**: 0.70
105
+ - **Challenge**: Two tight deadlines compete for early slots; meetings eat into critical windows
106
+
107
+ ### Hard β€” "Senior Engineer: Critical Release Day"
108
+
109
+ > You are a senior engineer on a critical release day. The system architecture redesign is the highest priority, but two production bugs are blocking users, docs need updating, and test coverage is behind.
110
+
111
+ | Task | Effort | Deadline | Importance |
112
+ |------|--------|----------|------------|
113
+ | Architecture Redesign | 0.80 | Step 16 | 0.9 |
114
+ | Fix: Login Timeout | 0.15 | Step 6 | 0.5 |
115
+ | Fix: CSV Export | 0.15 | Step 10 | 0.4 |
116
+ | API Documentation | 0.20 | Step 14 | 0.3 |
117
+ | Integration Tests | 0.20 | Step 18 | 0.6 |
118
+
119
+ - **Meetings**: Step 6
120
+ - **Starting energy**: 0.40 (!)
121
+ - **Challenge**: Total effort is 1.50 but max completable is ~1.0. Must triage. Deep work task needs sustained energy you don't have. Bug fix deadline collides with the meeting. Forces hard tradeoffs with no perfect solution.
122
+
123
+ ## Custom Task Mode
124
+
125
+ Beyond the 3 graded scenarios, RhythmEnv accepts **custom tasks** β€” plan your actual workday:
126
+
127
+ ```python
128
+ result = await env.reset(
129
+ task="custom",
130
+ tasks=[
131
+ {"name": "Write blog post", "effort": 0.5, "deadline": 12, "importance": 0.8,
132
+ "description": "Draft and edit the technical blog post on caching strategies"},
133
+ {"name": "Review PRs", "effort": 0.2, "deadline": 8, "importance": 0.6,
134
+ "description": "Review 3 open pull requests from the team"},
135
+ {"name": "Fix auth bug", "effort": 0.35, "deadline": 10, "importance": 0.9,
136
+ "description": "Debug and fix the OAuth token refresh issue"},
137
+ ],
138
+ meetings=[4, 10],
139
+ initial_energy=0.7,
140
+ )
141
+ ```
142
+
143
+ Custom tasks accept 1-10 tasks with configurable effort (0.05-1.0), deadlines (step 1-20), importance (0.1-1.0), meetings, and initial energy. This makes RhythmEnv usable as a real scheduling tool β€” connect it to your task manager and let the agent optimize your day.
144
 
145
  ## Episode Design
146
 
147
+ - **1 episode = 1 workday**: 20 steps of 30 minutes each (9am-7pm)
148
+ - **Deterministic**: Same scenario always produces the same initial state
149
+ - **Meetings block work**: During meeting steps, the agent's action is ignored
150
+ - **Tasks have deadlines**: Missing them increases stress and incurs reward penalties
151
+ - **Energy depletes with work, recovers with breaks**: The agent must pace itself
152
 
153
  ## Environment Dynamics
154
 
155
  **Energy** (0-1):
156
+ | Event | Change |
157
+ |-------|--------|
158
+ | Working on a task | βˆ’0.05 per step |
159
+ | Taking a break | +0.12 per step |
160
+ | In a meeting | βˆ’0.03 per step |
161
+ | Switching tasks | βˆ’0.02 penalty |
162
 
163
  **Stress** (0-1):
164
+ | Event | Change |
165
+ |-------|--------|
166
+ | Missed deadline | +0.15 |
167
+ | Deadline approaching (≀2 steps) | +0.03 |
168
+ | Taking a break | βˆ’0.08 |
169
+ | Completing a task | βˆ’0.10 |
170
 
171
+ **Task Progress**: `progress_delta = 0.15 Γ— current_energy` per step. Lower energy = slower work.
172
 
173
  ## Reward Design
174
 
 
176
 
177
  | Component | Formula | Signal |
178
  |-----------|---------|--------|
179
+ | Progress | `+delta Γ— importance Γ— 2.0` | Encourages productive work on important tasks |
180
  | Completion bonus | `+importance Γ— 1.5` | Rewards finishing tasks |
181
+ | Stress penalty | `βˆ’stress Γ— 0.1` | Penalizes sustained high stress |
182
+ | Deadline miss | `βˆ’0.3 per miss` | Penalizes missing deadlines |
183
+ | Switch penalty | `βˆ’0.1` | Discourages excessive context-switching |
184
+ | Idle penalty | `βˆ’0.05` | Penalizes wasted time |
185
+ | Break spam | `βˆ’0.05 Γ— max(0, consecutive βˆ’ 2)` | Diminishing returns on consecutive breaks |
186
+ | Mode bonus | `+0.05 (deep work) / +0.02 (execution)` | Hidden bonus for sustained focus |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
187
 
188
  ## Grader
189
 
190
  End-of-episode score in [0.0, 1.0]:
191
 
192
  ```
193
+ score = 0.45 Γ— completion + 0.20 Γ— deadline + 0.15 Γ— efficiency + 0.10 Γ— energy_mgmt + 0.10 Γ— stress_mgmt
194
  ```
195
 
196
  | Component | Calculation |
197
  |-----------|-------------|
198
  | Completion | Importance-weighted fraction of tasks completed |
199
  | Deadline | Fraction of deadlines met |
200
+ | Efficiency | Theoretical optimal steps / actual working steps |
201
+ | Energy mgmt | Average energy maintained over the episode |
202
+ | Stress mgmt | 1 βˆ’ average stress over the episode |
203
+
204
+ ## Baseline Scores
205
 
206
+ Measured with the included `inference.py` heuristic (no LLM):
207
+
208
+ | Scenario | Baseline Heuristic | Random Agent (avg of 10) |
209
+ |----------|-------------------|--------------------------|
210
+ | Easy | **0.533** | 0.319 (range 0.12-0.70) |
211
+ | Medium | **0.514** | 0.371 (range 0.18-0.54) |
212
+ | Hard | **0.486** | 0.323 (range 0.09-0.58) |
213
+
214
+ - Random agents score ~0.1-0.4 (degenerate strategies are penalized)
215
+ - Baseline heuristic scores ~0.49-0.53 (reasonable but not optimal)
216
+ - Strong LLM agents should score 0.65+ by learning energy management and deadline-aware triage
217
 
218
  ## Setup Instructions
219
 
 
235
  ### Running the Baseline
236
 
237
  ```bash
238
+ # Heuristic only (no API key needed):
239
+ python inference.py
240
+
241
+ # With LLM:
242
  export API_BASE_URL="https://router.huggingface.co/v1"
243
  export MODEL_NAME="Qwen/Qwen2.5-72B-Instruct"
244
  export HF_TOKEN="your-token"
245
  python inference.py
246
  ```
247
 
248
+ ## API Endpoints
249
+
250
+ | Method | Endpoint | Description |
251
+ |--------|----------|-------------|
252
+ | `POST` | `/reset` | Start a new episode (`{"task": "easy\|medium\|hard\|custom"}`) |
253
+ | `POST` | `/step` | Execute an action |
254
+ | `GET` | `/state` | Get current environment state |
255
+ | `GET` | `/health` | Health check |
256
+ | `GET` | `/metadata` | Environment metadata |
257
+ | `GET` | `/schema` | Action/observation JSON schemas |
258
+ | `POST` | `/mcp` | MCP JSON-RPC endpoint |
259
+
260
+ Interactive docs: [Swagger UI](https://InosLihka-rhythm-env.hf.space/docs)
261
+
262
  ## Validation
263
 
264
  ```bash
265
+ openenv validate # Local structure check
266
+ openenv validate --url https://InosLihka-rhythm-env.hf.space # Runtime check
267
  ```
268
 
269
  ## License