Spaces:

InosLihka
/

rhythm_env

Sleeping

Akhil Soni commited on 29 days ago

Commit

c07f15e

1 Parent(s): f36d90a

Fix bugs, add tests, and improve code quality

- Fix START_TASK/SWITCH_TASK semantic distinction (was identical code)
- Fix progress reward lost on task completion step (worked_on_task_id)
- Fix grader weights summing to 0.95 (now 1.0)
- Fix grader efficiency giving idle agents perfect score
- Fix grader deadline scoring (now importance-weighted)
- Fix reward clamp [-1,1] truncating completion signal (now [-2,2])
- Add auto-clear current_task_id on task completion
- Add early termination when all tasks complete
- Add custom task mode (task="custom")
- Add 27 tests covering reset, step, grader, and edge cases
- Add Dockerfile to project root for validation script
- Add BSD copyright headers to all files
- Remove dead code, unused imports, and unused dependency
- Update README with accurate baseline scores and documentation

Files changed (12) hide show

Dockerfile +47 -0
README.md +21 -16
__init__.py +6 -0
client.py +6 -0
inference.py +6 -0
models.py +6 -0
pyproject.toml +0 -1
server/__init__.py +6 -0
server/app.py +7 -1
server/rhythm_environment.py +78 -34
tests/__init__.py +5 -0
tests/test_rhythm_env.py +306 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,47 @@

+ARG BASE_IMAGE=ghcr.io/meta-pytorch/openenv-base:latest
+FROM ${BASE_IMAGE} AS builder
+WORKDIR /app
+COPY . /app/env
+WORKDIR /app/env
+RUN if ! command -v uv >/dev/null 2>&1; then \
+        curl -LsSf https://astral.sh/uv/install.sh | sh && \
+        mv /root/.local/bin/uv /usr/local/bin/uv && \
+        mv /root/.local/bin/uvx /usr/local/bin/uvx; \
+    fi
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    git \
+    && rm -rf /var/lib/apt/lists/*
+RUN --mount=type=cache,target=/root/.cache/uv \
+    if [ -f uv.lock ]; then \
+        uv sync --frozen --no-install-project --no-editable; \
+    else \
+        uv sync --no-install-project --no-editable; \
+    fi
+RUN --mount=type=cache,target=/root/.cache/uv \
+    if [ -f uv.lock ]; then \
+        uv sync --frozen --no-editable; \
+    else \
+        uv sync --no-editable; \
+    fi
+FROM ${BASE_IMAGE}
+WORKDIR /app
+COPY --from=builder /app/env/.venv /app/.venv
+COPY --from=builder /app/env /app/env
+ENV PATH="/app/.venv/bin:$PATH"
+ENV PYTHONPATH="/app/env:$PYTHONPATH"
+HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
+    CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')" || exit 1
+CMD ["sh", "-c", "cd /app/env && uvicorn server.app:app --host 0.0.0.0 --port 8000"]

README.md CHANGED Viewed

@@ -47,11 +47,13 @@ asyncio.run(main())
 | Action | Parameters | Description |
 |--------|-----------|-------------|
-| `START_TASK` | `task_id: int` | Begin working on a new task |
 | `CONTINUE_TASK` | — | Continue working on current task |
-| `SWITCH_TASK` | `task_id: int` | Switch to a different task (energy + reward penalty) |
 | `TAKE_BREAK` | — | Rest to recover energy and reduce stress |
 ## Observation Space
 | Field | Type | Description |
@@ -149,6 +151,8 @@ Custom tasks accept 1-10 tasks with configurable effort (0.05-1.0), deadlines (s
 - **Meetings block work**: During meeting steps, the agent's action is ignored
 - **Tasks have deadlines**: Missing them increases stress and incurs reward penalties
 - **Energy depletes with work, recovers with breaks**: The agent must pace itself
 ## Environment Dynamics
@@ -168,17 +172,17 @@ Custom tasks accept 1-10 tasks with configurable effort (0.05-1.0), deadlines (s
 | Taking a break | −0.08 |
 | Completing a task | −0.10 |
-**Task Progress**: `progress_delta = 0.15 × current_energy` per step. Lower energy = slower work.
 ## Reward Design
-Multi-component reward per step (clamped to [-1, 1]):
 | Component | Formula | Signal |
 |-----------|---------|--------|
 | Progress | `+delta × importance × 2.0` | Encourages productive work on important tasks |
 | Completion bonus | `+importance × 1.5` | Rewards finishing tasks |
-| Stress penalty | `−stress × 0.1` | Penalizes sustained high stress |
 | Deadline miss | `−0.3 per miss` | Penalizes missing deadlines |
 | Switch penalty | `−0.1` | Discourages excessive context-switching |
 | Idle penalty | `−0.05` | Penalizes wasted time |
@@ -190,30 +194,31 @@ Multi-component reward per step (clamped to [-1, 1]):
 End-of-episode score in [0.0, 1.0]:
 ```
-score = 0.45 × completion + 0.20 × deadline + 0.15 × efficiency + 0.10 × energy_mgmt + 0.10 × stress_mgmt
 ```
 | Component | Calculation |
 |-----------|-------------|
 | Completion | Importance-weighted fraction of tasks completed |
-| Deadline | Fraction of deadlines met |
-| Efficiency | Theoretical optimal steps / actual working steps |
 | Energy mgmt | Average energy maintained over the episode |
 | Stress mgmt | 1 − average stress over the episode |
 ## Baseline Scores
 Measured with the included `inference.py` heuristic (no LLM):
-| Scenario | Baseline Heuristic | Random Agent (avg of 10) |
-|----------|-------------------|--------------------------|
-| Easy | **0.533** | 0.319 (range 0.12-0.70) |
-| Medium | **0.514** | 0.371 (range 0.18-0.54) |
-| Hard | **0.486** | 0.323 (range 0.09-0.58) |
-- Random agents score ~0.1-0.4 (degenerate strategies are penalized)
-- Baseline heuristic scores ~0.49-0.53 (reasonable but not optimal)
-- Strong LLM agents should score 0.65+ by learning energy management and deadline-aware triage
 ## Setup Instructions

 | Action | Parameters | Description |
 |--------|-----------|-------------|
+| `START_TASK` | `task_id: int` | Begin working on a task (only when idle — no current task) |
 | `CONTINUE_TASK` | — | Continue working on current task |
+| `SWITCH_TASK` | `task_id: int` | Switch to a different task (requires active task; energy + reward penalty) |
 | `TAKE_BREAK` | — | Rest to recover energy and reduce stress |
+**Note**: `START_TASK` and `SWITCH_TASK` are semantically distinct. `START_TASK` is only valid when the agent has no active task (e.g., after a break or at episode start). `SWITCH_TASK` is only valid when already working on a different task. Using the wrong one results in an idle penalty.
 ## Observation Space
 | Field | Type | Description |
 - **Meetings block work**: During meeting steps, the agent's action is ignored
 - **Tasks have deadlines**: Missing them increases stress and incurs reward penalties
 - **Energy depletes with work, recovers with breaks**: The agent must pace itself
+- **Task completion auto-clears**: When a task is finished, `current_task_id` resets to `null` — the agent can immediately `START_TASK` a new one without needing a break or switch
+- **Early termination**: The episode ends early if all tasks are completed, rewarding efficient agents
 ## Environment Dynamics
 | Taking a break | −0.08 |
 | Completing a task | −0.10 |
+**Task Progress**: `progress_delta = 0.15 × current_energy × (1 - stress × 0.3)` per step. Lower energy = slower work. High stress also impairs productivity — at stress=1.0, output drops to 70% of normal.
 ## Reward Design
+Multi-component reward per step (clamped to [-2, 2]):
 | Component | Formula | Signal |
 |-----------|---------|--------|
 | Progress | `+delta × importance × 2.0` | Encourages productive work on important tasks |
 | Completion bonus | `+importance × 1.5` | Rewards finishing tasks |
+| Stress penalty | `−stress × 0.15` | Penalizes sustained high stress |
 | Deadline miss | `−0.3 per miss` | Penalizes missing deadlines |
 | Switch penalty | `−0.1` | Discourages excessive context-switching |
 | Idle penalty | `−0.05` | Penalizes wasted time |
 End-of-episode score in [0.0, 1.0]:
 ```
+score = 0.40 × completion + 0.20 × deadline + 0.15 × efficiency + 0.10 × energy_mgmt + 0.15 × stress_mgmt − switch_penalty
 ```
 | Component | Calculation |
 |-----------|-------------|
 | Completion | Importance-weighted fraction of tasks completed |
+| Deadline | Importance-weighted fraction of deadlines met (missing important deadlines hurts more) |
+| Efficiency | Realistic optimal steps (using avg energy) / actual working steps |
 | Energy mgmt | Average energy maintained over the episode |
 | Stress mgmt | 1 − average stress over the episode |
+| Switch penalty | `min(0.15, switch_count × 0.02)` — penalizes excessive context-switching |
 ## Baseline Scores
 Measured with the included `inference.py` heuristic (no LLM):
+| Scenario | Baseline Heuristic | Random Agent (avg of 10) | Idle (all breaks) |
+|----------|-------------------|--------------------------|-------------------|
+| Easy | **0.670** | 0.310 | 0.241 |
+| Medium | **0.612** | 0.533 | 0.238 |
+| Hard | 0.145 | **0.422** | 0.232 |
+- Idle/degenerate strategies score ~0.23-0.24 (zero completion, zero efficiency)
+- The heuristic dominates on easy and medium but fails on hard — it is too conservative with energy management for the low-energy start (0.40), completing zero tasks. This demonstrates that hard truly requires intelligent triage, not simple rules.
+- Strong LLM agents should score 0.65+ by learning energy management, stress-aware pacing, and deadline triage
 ## Setup Instructions

__init__.py CHANGED Viewed

@@ -1,3 +1,9 @@
 """
 RhythmEnv — Daily Planning RL Environment for OpenEnv.

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
 """
 RhythmEnv — Daily Planning RL Environment for OpenEnv.

client.py CHANGED Viewed

@@ -1,3 +1,9 @@
 """
 RhythmEnv Client.

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
 """
 RhythmEnv Client.

inference.py CHANGED Viewed

@@ -1,3 +1,9 @@
 """
 RhythmEnv Inference Script
 ===================================

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
 """
 RhythmEnv Inference Script
 ===================================

models.py CHANGED Viewed

@@ -1,3 +1,9 @@
 """
 Data models for RhythmEnv Environment.

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
 """
 Data models for RhythmEnv Environment.

pyproject.toml CHANGED Viewed

@@ -18,7 +18,6 @@ dependencies = [
     "fastapi>=0.115.0",
     "pydantic>=2.0.0",
     "uvicorn>=0.24.0",
-    "requests>=2.31.0",
 ]
 [project.optional-dependencies]

     "fastapi>=0.115.0",
     "pydantic>=2.0.0",
     "uvicorn>=0.24.0",
 ]
 [project.optional-dependencies]

server/__init__.py CHANGED Viewed

@@ -1,3 +1,9 @@
 """RhythmEnv environment server components."""
 from .rhythm_environment import RhythmEnvironment

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
 """RhythmEnv environment server components."""
 from .rhythm_environment import RhythmEnvironment

server/app.py CHANGED Viewed

@@ -1,3 +1,9 @@
 """
 FastAPI application for the RhythmEnv Environment.
@@ -24,7 +30,7 @@ Usage:
 try:
     from openenv.core.env_server.http_server import create_app
-except Exception as e:  # pragma: no cover
     raise ImportError(
         "openenv is required for the web interface. Install dependencies with '\n    uv sync\n'"
     ) from e

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
 """
 FastAPI application for the RhythmEnv Environment.
 try:
     from openenv.core.env_server.http_server import create_app
+except ImportError as e:  # pragma: no cover
     raise ImportError(
         "openenv is required for the web interface. Install dependencies with '\n    uv sync\n'"
     ) from e

server/rhythm_environment.py CHANGED Viewed

@@ -1,3 +1,9 @@
 """
 RhythmEnv Environment Implementation.
@@ -22,9 +28,7 @@ try:
         RhythmState,
         TaskInfo,
     )
-except ImportError as e:
-    if "relative import" not in str(e) and "no known parent package" not in str(e):
-        raise
     from models import (
         ActionType,
         RhythmAction,
@@ -196,7 +200,8 @@ BREAK_SPAM_PENALTY = 0.05
 SWITCH_PENALTY = 0.1
 IDLE_PENALTY = 0.05
 DEADLINE_MISS_PENALTY = 0.3
-STRESS_PENALTY_RATE = 0.1
 PROGRESS_REWARD_SCALE = 2.0
 COMPLETION_BONUS_SCALE = 1.5
 DEEP_WORK_BONUS = 0.05
@@ -308,6 +313,7 @@ class RhythmEnvironment(Environment):
         switched = False
         is_idle = False
         is_meeting = self._timestep in self._meetings
         # --- Meeting override ---
         if is_meeting:
@@ -329,18 +335,16 @@ class RhythmEnvironment(Environment):
                 self._consecutive_breaks = 0
                 if action.action_type == ActionType.START_TASK:
-                    if self._current_task_id is not None and self._current_task_id != action.task_id:
-                        switched = True
                     self._current_task_id = action.task_id
                 elif action.action_type == ActionType.SWITCH_TASK:
-                    if self._current_task_id is not None and self._current_task_id != action.task_id:
-                        switched = True
                     self._current_task_id = action.task_id
                 elif action.action_type == ActionType.CONTINUE_TASK:
-                    if self._current_task_id is None:
-                        is_idle = True
                 # Apply switch energy penalty
                 if switched:
@@ -353,20 +357,23 @@ class RhythmEnvironment(Environment):
                     and not is_idle
                     and self._current_task_id not in self._completed_tasks
                 ):
                     task = self._tasks[self._current_task_id]
-                    progress_delta = PROGRESS_RATE * self._energy
                     task["progress"] = min(task["effort"], task["progress"] + progress_delta)
                     # Check completion
                     if task["progress"] >= task["effort"] and self._current_task_id not in self._completed_tasks:
                         self._completed_tasks.add(self._current_task_id)
                         completed_this_step.append(self._current_task_id)
                     self._energy = max(0.0, self._energy - ENERGY_WORK_DRAIN)
                     self._steps_working += 1
-                elif self._current_task_id is not None and self._current_task_id in self._completed_tasks:
-                    # Working on already-completed task = idle
-                    is_idle = True
         # --- Check deadlines ---
         new_missed: List[int] = []
@@ -400,9 +407,10 @@ class RhythmEnvironment(Environment):
         # --- Compute reward ---
         reward = 0.0
-        # Progress reward
-        if progress_delta > 0 and self._current_task_id is not None:
-            task = self._tasks[self._current_task_id]
             r = progress_delta * task["importance"] * PROGRESS_REWARD_SCALE
             reward += r
             reward_breakdown["progress_reward"] = round(r, 4)
@@ -454,11 +462,12 @@ class RhythmEnvironment(Environment):
             reward += mode_bonus
             reward_breakdown["mode_bonus"] = round(mode_bonus, 4)
-        # Clamp reward
-        reward = max(-1.0, min(1.0, round(reward, 4)))
         # --- Done? ---
-        done = self._timestep >= MAX_STEPS
         # --- Final grading ---
         if done:
@@ -489,14 +498,29 @@ class RhythmEnvironment(Environment):
     def _validate_action(self, action: RhythmAction) -> bool:
         """Return True if the action is valid given current state."""
-        if action.action_type in (ActionType.START_TASK, ActionType.SWITCH_TASK):
             if action.task_id is None:
                 return False
             if action.task_id < 0 or action.task_id >= len(self._tasks):
                 return False
             if action.task_id in self._completed_tasks:
                 return False
-        if action.action_type == ActionType.CONTINUE_TASK:
             if self._current_task_id is None:
                 return False
             if self._current_task_id in self._completed_tasks:
@@ -529,7 +553,15 @@ class RhythmEnvironment(Environment):
         meetings = kwargs.get("meetings", [])
         if not isinstance(meetings, list):
             meetings = []
-        meetings = [int(m) for m in meetings if 0 <= int(m) < MAX_STEPS]
         initial_energy = max(0.1, min(1.0, float(kwargs.get("initial_energy", 0.8))))
@@ -570,34 +602,46 @@ class RhythmEnvironment(Environment):
             completed_importance / total_importance if total_importance > 0 else 0.0
         )
-        # 2. Deadline score
-        total_tasks = len(self._tasks)
-        deadlines_met = total_tasks - len(self._missed_deadlines)
-        deadline_score = deadlines_met / total_tasks if total_tasks > 0 else 0.0
-        # 3. Efficiency score
         total_effort = sum(
             t["effort"]
             for t in self._tasks
             if t["id"] in self._completed_tasks
         )
-        optimal_steps = total_effort / PROGRESS_RATE if total_effort > 0 else 1.0
-        actual_steps = max(self._steps_working, 1)
-        efficiency_score = min(1.0, optimal_steps / actual_steps)
         # 4. Energy management (average energy)
-        steps_elapsed = max(self._timestep, 1)
         energy_management = self._total_energy / steps_elapsed
         # 5. Stress management (1 - average stress)
         stress_management = 1.0 - (self._total_stress / steps_elapsed)
         score = (
-            0.45 * completion_score
             + 0.20 * deadline_score
             + 0.15 * efficiency_score
             + 0.10 * energy_management
-            + 0.10 * stress_management
         )
         return max(0.0, min(1.0, score))

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
 """
 RhythmEnv Environment Implementation.
         RhythmState,
         TaskInfo,
     )
+except (ImportError, ModuleNotFoundError):
     from models import (
         ActionType,
         RhythmAction,
 SWITCH_PENALTY = 0.1
 IDLE_PENALTY = 0.05
 DEADLINE_MISS_PENALTY = 0.3
+STRESS_PENALTY_RATE = 0.15
+STRESS_PROGRESS_FACTOR = 0.3
 PROGRESS_REWARD_SCALE = 2.0
 COMPLETION_BONUS_SCALE = 1.5
 DEEP_WORK_BONUS = 0.05
         switched = False
         is_idle = False
         is_meeting = self._timestep in self._meetings
+        worked_on_task_id: Optional[int] = None  # tracks which task earned progress this step
         # --- Meeting override ---
         if is_meeting:
                 self._consecutive_breaks = 0
                 if action.action_type == ActionType.START_TASK:
+                    # Validation ensures current_task_id is None here
                     self._current_task_id = action.task_id
                 elif action.action_type == ActionType.SWITCH_TASK:
+                    # Validation ensures current_task_id exists and differs from target
+                    switched = True
                     self._current_task_id = action.task_id
                 elif action.action_type == ActionType.CONTINUE_TASK:
+                    pass  # Validation already ensured current_task_id is valid
                 # Apply switch energy penalty
                 if switched:
                     and not is_idle
                     and self._current_task_id not in self._completed_tasks
                 ):
+                    worked_on_task_id = self._current_task_id
                     task = self._tasks[self._current_task_id]
+                    # Stress impairs productivity: at stress=1.0, progress is 70% of normal
+                    stress_factor = 1.0 - (self._stress * STRESS_PROGRESS_FACTOR)
+                    progress_delta = PROGRESS_RATE * self._energy * stress_factor
                     task["progress"] = min(task["effort"], task["progress"] + progress_delta)
                     # Check completion
                     if task["progress"] >= task["effort"] and self._current_task_id not in self._completed_tasks:
                         self._completed_tasks.add(self._current_task_id)
                         completed_this_step.append(self._current_task_id)
+                        # Auto-clear: agent becomes idle after finishing a task,
+                        # so they can START_TASK a new one without needing to break/switch.
+                        self._current_task_id = None
                     self._energy = max(0.0, self._energy - ENERGY_WORK_DRAIN)
                     self._steps_working += 1
         # --- Check deadlines ---
         new_missed: List[int] = []
         # --- Compute reward ---
         reward = 0.0
+        # Progress reward (use worked_on_task_id since current_task_id may be
+        # cleared on completion)
+        if progress_delta > 0 and worked_on_task_id is not None:
+            task = self._tasks[worked_on_task_id]
             r = progress_delta * task["importance"] * PROGRESS_REWARD_SCALE
             reward += r
             reward_breakdown["progress_reward"] = round(r, 4)
             reward += mode_bonus
             reward_breakdown["mode_bonus"] = round(mode_bonus, 4)
+        # Clamp reward (wide enough to preserve completion signal)
+        reward = max(-2.0, min(2.0, round(reward, 4)))
         # --- Done? ---
+        all_tasks_completed = len(self._completed_tasks) == len(self._tasks)
+        done = self._timestep >= MAX_STEPS or all_tasks_completed
         # --- Final grading ---
         if done:
     def _validate_action(self, action: RhythmAction) -> bool:
         """Return True if the action is valid given current state."""
+        if action.action_type == ActionType.START_TASK:
             if action.task_id is None:
                 return False
             if action.task_id < 0 or action.task_id >= len(self._tasks):
                 return False
             if action.task_id in self._completed_tasks:
                 return False
+            # START_TASK: only valid when not currently working on anything
+            if self._current_task_id is not None:
+                return False
+        elif action.action_type == ActionType.SWITCH_TASK:
+            if action.task_id is None:
+                return False
+            if action.task_id < 0 or action.task_id >= len(self._tasks):
+                return False
+            if action.task_id in self._completed_tasks:
+                return False
+            # SWITCH_TASK: only valid when already working on a different task
+            if self._current_task_id is None:
+                return False
+            if self._current_task_id == action.task_id:
+                return False
+        elif action.action_type == ActionType.CONTINUE_TASK:
             if self._current_task_id is None:
                 return False
             if self._current_task_id in self._completed_tasks:
         meetings = kwargs.get("meetings", [])
         if not isinstance(meetings, list):
             meetings = []
+        valid_meetings = []
+        for m in meetings:
+            try:
+                mi = int(m)
+                if 0 <= mi < MAX_STEPS:
+                    valid_meetings.append(mi)
+            except (ValueError, TypeError):
+                continue
+        meetings = valid_meetings
         initial_energy = max(0.1, min(1.0, float(kwargs.get("initial_energy", 0.8))))
             completed_importance / total_importance if total_importance > 0 else 0.0
         )
+        # 2. Deadline score (importance-weighted: missing important deadlines hurts more)
+        met_importance = sum(
+            t["importance"]
+            for t in self._tasks
+            if t["id"] not in self._missed_deadlines
+        )
+        deadline_score = met_importance / total_importance if total_importance > 0 else 0.0
+        # 3. Efficiency score (using realistic optimal based on average energy)
+        steps_elapsed = max(self._timestep, 1)
         total_effort = sum(
             t["effort"]
             for t in self._tasks
             if t["id"] in self._completed_tasks
         )
+        if total_effort > 0 and self._steps_working > 0:
+            avg_energy = self._total_energy / steps_elapsed
+            effective_rate = PROGRESS_RATE * max(avg_energy, 0.3)
+            optimal_steps = total_effort / effective_rate
+            efficiency_score = min(1.0, optimal_steps / self._steps_working)
+        else:
+            # No tasks completed = zero efficiency
+            efficiency_score = 0.0
         # 4. Energy management (average energy)
         energy_management = self._total_energy / steps_elapsed
         # 5. Stress management (1 - average stress)
         stress_management = 1.0 - (self._total_stress / steps_elapsed)
+        # 6. Switch penalty in grader (penalize excessive context-switching)
+        switch_penalty = min(0.15, self._switch_count * 0.02)
         score = (
+            0.40 * completion_score
             + 0.20 * deadline_score
             + 0.15 * efficiency_score
             + 0.10 * energy_management
+            + 0.15 * stress_management
+            - switch_penalty
         )
         return max(0.0, min(1.0, score))

tests/__init__.py ADDED Viewed

	@@ -0,0 +1,5 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.

tests/test_rhythm_env.py ADDED Viewed

	@@ -0,0 +1,306 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""Tests for RhythmEnv environment."""
+import pytest
+from server.rhythm_environment import (
+    MAX_STEPS,
+    RhythmEnvironment,
+)
+from models import ActionType, RhythmAction
+@pytest.fixture
+def env():
+    return RhythmEnvironment()
+# ---------------------------------------------------------------------------
+# reset() tests
+# ---------------------------------------------------------------------------
+class TestReset:
+    def test_reset_returns_observation(self, env):
+        obs = env.reset(task="easy")
+        assert obs.timestep == 0
+        assert obs.done is False
+        assert obs.reward == 0.0
+    def test_reset_easy_has_3_tasks(self, env):
+        obs = env.reset(task="easy")
+        assert len(obs.tasks) == 3
+    def test_reset_medium_has_4_tasks(self, env):
+        obs = env.reset(task="medium")
+        assert len(obs.tasks) == 4
+    def test_reset_hard_has_5_tasks(self, env):
+        obs = env.reset(task="hard")
+        assert len(obs.tasks) == 5
+    def test_reset_unknown_task_defaults_to_easy(self, env):
+        obs = env.reset(task="nonexistent")
+        assert len(obs.tasks) == 3
+    def test_reset_empty_defaults_to_easy(self, env):
+        obs = env.reset()
+        assert len(obs.tasks) == 3
+    def test_reset_clears_state(self, env):
+        obs = env.reset(task="easy")
+        env.step(RhythmAction(action_type=ActionType.START_TASK, task_id=0))
+        obs = env.reset(task="easy")
+        assert obs.timestep == 0
+        assert obs.energy == 0.75
+        assert obs.current_task_id is None
+    def test_reset_custom_tasks(self, env):
+        obs = env.reset(
+            task="custom",
+            tasks=[
+                {"name": "T1", "effort": 0.3, "deadline": 10, "importance": 0.8},
+                {"name": "T2", "effort": 0.2, "deadline": 15, "importance": 0.5},
+            ],
+            meetings=[5],
+            initial_energy=0.6,
+        )
+        assert len(obs.tasks) == 2
+        assert obs.tasks[0].name == "T1"
+        assert obs.energy == 0.6
+        assert obs.meetings == [5]
+    def test_reset_custom_rejects_empty_tasks(self, env):
+        with pytest.raises(ValueError, match="tasks"):
+            env.reset(task="custom", tasks=[])
+    def test_reset_custom_clamps_bounds(self, env):
+        obs = env.reset(
+            task="custom",
+            tasks=[{"name": "X", "effort": 99, "deadline": 999, "importance": -5}],
+            initial_energy=50,
+        )
+        t = obs.tasks[0]
+        assert t.effort <= 1.0
+        assert t.deadline <= MAX_STEPS
+        assert t.importance >= 0.1
+        assert obs.energy <= 1.0
+# ---------------------------------------------------------------------------
+# step() tests
+# ---------------------------------------------------------------------------
+class TestStep:
+    def test_step_advances_timestep(self, env):
+        env.reset(task="easy")
+        obs = env.step(RhythmAction(action_type=ActionType.START_TASK, task_id=0))
+        assert obs.timestep == 1
+    def test_working_drains_energy(self, env):
+        env.reset(task="easy")
+        initial_energy = 0.75
+        obs = env.step(RhythmAction(action_type=ActionType.START_TASK, task_id=0))
+        assert obs.energy < initial_energy
+    def test_break_recovers_energy(self, env):
+        env.reset(task="hard")  # starts at 0.4 energy
+        obs = env.step(RhythmAction(action_type=ActionType.TAKE_BREAK))
+        assert obs.energy > 0.4
+    def test_progress_accumulates(self, env):
+        env.reset(task="easy")
+        obs = env.step(RhythmAction(action_type=ActionType.START_TASK, task_id=0))
+        assert obs.tasks[0].progress > 0.0
+    def test_continue_task_works(self, env):
+        env.reset(task="easy")
+        env.step(RhythmAction(action_type=ActionType.START_TASK, task_id=0))
+        obs = env.step(RhythmAction(action_type=ActionType.CONTINUE_TASK))
+        assert obs.tasks[0].progress > 0.0
+    def test_start_task_invalid_when_already_working(self, env):
+        """START_TASK should fail (idle) when already working on a task."""
+        env.reset(task="easy")
+        env.step(RhythmAction(action_type=ActionType.START_TASK, task_id=0))
+        # Trying START_TASK again while already on task 0 should be invalid
+        obs = env.step(RhythmAction(action_type=ActionType.START_TASK, task_id=1))
+        # Should incur idle penalty since START is invalid when current_task_id is set
+        assert "idle_penalty" in obs.reward_breakdown
+    def test_switch_task_requires_current_task(self, env):
+        """SWITCH_TASK should fail when no task is active."""
+        env.reset(task="easy")
+        obs = env.step(RhythmAction(action_type=ActionType.SWITCH_TASK, task_id=0))
+        assert "idle_penalty" in obs.reward_breakdown
+    def test_meeting_blocks_action(self, env):
+        """At meeting timesteps, the action should be ignored."""
+        env.reset(task="easy")  # meetings at steps 3 and 11
+        # Advance to step 3 (meeting)
+        for _ in range(3):
+            env.step(RhythmAction(action_type=ActionType.TAKE_BREAK))
+        # Step at meeting time — action should be ignored
+        obs = env.step(RhythmAction(action_type=ActionType.START_TASK, task_id=0))
+        # After meeting, current_task_id should still be None
+        assert obs.current_task_id is None
+    def test_episode_ends_at_max_steps(self, env):
+        env.reset(task="easy")
+        for _ in range(MAX_STEPS):
+            obs = env.step(RhythmAction(action_type=ActionType.TAKE_BREAK))
+        assert obs.done is True
+    def test_early_termination_when_all_tasks_complete(self, env):
+        """Episode ends early if all tasks are completed."""
+        env.reset(
+            task="custom",
+            tasks=[{"name": "Tiny", "effort": 0.05, "deadline": 19, "importance": 0.5}],
+            initial_energy=1.0,
+        )
+        obs = env.step(RhythmAction(action_type=ActionType.START_TASK, task_id=0))
+        assert obs.done is True
+        assert obs.timestep < MAX_STEPS
+        assert "final_score" in obs.reward_breakdown
+    def test_stress_affects_progress(self, env):
+        """High stress should reduce progress rate."""
+        # Run 1: zero-stress scenario (custom, high energy, generous deadline)
+        env.reset(
+            task="custom",
+            tasks=[{"name": "A", "effort": 1.0, "deadline": 19, "importance": 0.5}],
+            initial_energy=1.0,
+        )
+        obs1 = env.step(RhythmAction(action_type=ActionType.START_TASK, task_id=0))
+        progress_low_stress = obs1.tasks[0].progress
+        # Run 2: build up stress by missing multiple deadlines, then measure
+        # progress. Use 3 tasks with deadline=1 so that after 2 steps,
+        # timestep=2 > 1 triggers 3 deadline misses → stress += 0.45.
+        # Work on task 3 throughout so energy drains equally.
+        env.reset(
+            task="custom",
+            tasks=[
+                {"name": "M1", "effort": 1.0, "deadline": 1, "importance": 0.9},
+                {"name": "M2", "effort": 1.0, "deadline": 1, "importance": 0.9},
+                {"name": "M3", "effort": 1.0, "deadline": 1, "importance": 0.9},
+                {"name": "B", "effort": 1.0, "deadline": 19, "importance": 0.5},
+            ],
+            initial_energy=1.0,
+        )
+        # Step 1 (timestep 0→1): work on task B, deadlines approaching → some stress
+        env.step(RhythmAction(action_type=ActionType.START_TASK, task_id=3))
+        # Step 2 (timestep 1→2): continue working, 3 deadlines missed → stress +0.45
+        env.step(RhythmAction(action_type=ActionType.CONTINUE_TASK))
+        # Step 3: continue working on B under high stress
+        obs2 = env.step(RhythmAction(action_type=ActionType.CONTINUE_TASK))
+        # Total progress on B across steps 1-3
+        progress_high_stress_total = obs2.tasks[3].progress
+        # Progress in step 3 alone = total - progress from steps 1+2
+        # But simpler: just compare step-1 progress (no stress) vs later progress.
+        # Since energy decreases each step and stress accumulates, step 3's
+        # marginal progress is less than step 1's.
+        # We can verify stress is significant:
+        assert obs2.stress >= 0.3
+        # And the average per-step progress under stress is lower than the
+        # first step's progress with zero stress and full energy:
+        avg_progress_per_step = progress_high_stress_total / 3
+        assert progress_low_stress > avg_progress_per_step
+# ---------------------------------------------------------------------------
+# Grader tests
+# ---------------------------------------------------------------------------
+class TestGrader:
+    def test_final_score_in_range(self, env):
+        env.reset(task="easy")
+        for _ in range(MAX_STEPS):
+            obs = env.step(RhythmAction(action_type=ActionType.TAKE_BREAK))
+        score = obs.reward_breakdown.get("final_score", -1)
+        assert 0.0 <= score <= 1.0
+    def test_doing_nothing_scores_low(self, env):
+        env.reset(task="hard")
+        for _ in range(MAX_STEPS):
+            obs = env.step(RhythmAction(action_type=ActionType.TAKE_BREAK))
+        score = obs.reward_breakdown["final_score"]
+        assert score < 0.5
+    def test_heuristic_beats_random(self, env):
+        """The simple heuristic should beat a do-nothing strategy."""
+        import random
+        random.seed(42)
+        # Heuristic run
+        obs = env.reset(task="easy")
+        for _ in range(MAX_STEPS):
+            if obs.done:
+                break
+            uncompleted = [t for t in obs.tasks if t.progress < t.effort]
+            if obs.energy < 0.3 or not uncompleted:
+                action = RhythmAction(action_type=ActionType.TAKE_BREAK)
+            elif obs.current_task_id is not None:
+                action = RhythmAction(action_type=ActionType.CONTINUE_TASK)
+            else:
+                action = RhythmAction(action_type=ActionType.START_TASK, task_id=uncompleted[0].id)
+            obs = env.step(action)
+        heuristic_score = obs.reward_breakdown["final_score"]
+        # Do-nothing run
+        obs = env.reset(task="easy")
+        for _ in range(MAX_STEPS):
+            obs = env.step(RhythmAction(action_type=ActionType.TAKE_BREAK))
+        idle_score = obs.reward_breakdown["final_score"]
+        assert heuristic_score > idle_score
+    def test_grader_deterministic(self, env):
+        """Same actions produce same score."""
+        def run():
+            obs = env.reset(task="medium")
+            for _ in range(MAX_STEPS):
+                if obs.done:
+                    break
+                obs = env.step(RhythmAction(action_type=ActionType.TAKE_BREAK))
+            return obs.reward_breakdown["final_score"]
+        assert run() == run()
+# ---------------------------------------------------------------------------
+# Edge cases
+# ---------------------------------------------------------------------------
+class TestEdgeCases:
+    def test_complete_task_then_continue_is_idle(self, env):
+        """Continuing after completing a task should be treated as idle.
+        With auto-clear, current_task_id resets to None on completion,
+        so CONTINUE_TASK becomes invalid (no active task)."""
+        env.reset(
+            task="custom",
+            tasks=[
+                {"name": "Quick", "effort": 0.05, "deadline": 19, "importance": 0.5},
+                {"name": "Other", "effort": 1.0, "deadline": 19, "importance": 0.5},
+            ],
+            initial_energy=1.0,
+        )
+        # One step should complete task 0 (0.15 * 1.0 > 0.05)
+        obs = env.step(RhythmAction(action_type=ActionType.START_TASK, task_id=0))
+        assert obs.tasks[0].progress >= obs.tasks[0].effort
+        # current_task_id auto-cleared after completion
+        assert obs.current_task_id is None
+        # CONTINUE with no current task → idle
+        obs = env.step(RhythmAction(action_type=ActionType.CONTINUE_TASK))
+        assert "idle_penalty" in obs.reward_breakdown
+    def test_reward_breakdown_has_all_components(self, env):
+        env.reset(task="easy")
+        obs = env.step(RhythmAction(action_type=ActionType.START_TASK, task_id=0))
+        # Should have at least progress_reward and stress_penalty
+        assert "progress_reward" in obs.reward_breakdown
+        assert "stress_penalty" in obs.reward_breakdown