Spaces:
Sleeping
Sleeping
Upload 8 files
Browse files- Dockerfile +16 -0
- README.md +151 -8
- app.py +88 -0
- environment.py +281 -0
- inference.py +180 -0
- models.py +74 -0
- openenv.yaml +65 -0
- requirements.txt +6 -0
Dockerfile
ADDED
|
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
FROM python:3.11-slim
|
| 2 |
+
|
| 3 |
+
WORKDIR /app
|
| 4 |
+
|
| 5 |
+
RUN apt-get update && apt-get install -y --no-install-recommends \
|
| 6 |
+
build-essential \
|
| 7 |
+
&& rm -rf /var/lib/apt/lists/*
|
| 8 |
+
|
| 9 |
+
COPY requirements.txt .
|
| 10 |
+
RUN pip install --no-cache-dir -r requirements.txt
|
| 11 |
+
|
| 12 |
+
COPY . .
|
| 13 |
+
|
| 14 |
+
EXPOSE 7860
|
| 15 |
+
|
| 16 |
+
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]
|
README.md
CHANGED
|
@@ -1,11 +1,154 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
---
|
| 10 |
|
| 11 |
-
|
|
|
|
|
|
| 1 |
+
# FocusFlow RL Environment
|
| 2 |
+
### Meta x Scaler OpenEnv Hackathon 2026
|
| 3 |
+
|
| 4 |
+
> An RL environment where an AI agent learns to manage a student's focus session β
|
| 5 |
+
> blocking distracting apps, timing breaks, and maximising deep-focus time.
|
| 6 |
+
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
## What It Is
|
| 10 |
+
|
| 11 |
+
FocusFlow is an **OpenEnv-compatible reinforcement learning environment** built on top of
|
| 12 |
+
Meta's OpenEnv framework. An LLM agent is placed in a student's digital world and must:
|
| 13 |
+
|
| 14 |
+
- **Block** distracting apps (Instagram, YouTube, BGMI, etc.) before they steal focus
|
| 15 |
+
- **Time breaks** correctly using the Pomodoro technique (25 min focus / 5 min break)
|
| 16 |
+
- **Resist** distraction events that spawn randomly during the session
|
| 17 |
+
- **Maximise** the focus score across multiple study sessions
|
| 18 |
+
|
| 19 |
+
The environment simulates a realistic student productivity scenario β making it a strong
|
| 20 |
+
candidate for training agents that improve human focus and wellbeing.
|
| 21 |
+
|
| 22 |
+
---
|
| 23 |
+
|
| 24 |
+
## Environment Design
|
| 25 |
+
|
| 26 |
+
### Action Space (5 discrete actions)
|
| 27 |
+
|
| 28 |
+
| Action | Description | Reward |
|
| 29 |
+
|---|---|---|
|
| 30 |
+
| `focus` | Stay focused, do nothing | +0.05 per step |
|
| 31 |
+
| `block_app` | Block a distracting app | +0.20 Γ temptation_level |
|
| 32 |
+
| `take_break` | Take a voluntary break | +0.30 if timed correctly |
|
| 33 |
+
| `adjust_timer` | Change pomodoro duration | +0.01 |
|
| 34 |
+
| `check_app` | Give in to distraction | **-0.50** |
|
| 35 |
+
|
| 36 |
+
### Observation Space
|
| 37 |
+
|
| 38 |
+
```json
|
| 39 |
+
{
|
| 40 |
+
"time_remaining_seconds": 1200,
|
| 41 |
+
"current_phase": "focus",
|
| 42 |
+
"active_distractions": ["Instagram", "YouTube"],
|
| 43 |
+
"blocked_apps": ["BGMI"],
|
| 44 |
+
"sessions_completed": 0,
|
| 45 |
+
"focus_score": 0.85,
|
| 46 |
+
"last_action_feedback": "Blocked BGMI. Reward scaled by temptation level (0.95).",
|
| 47 |
+
"distraction_event": "Reddit"
|
| 48 |
+
}
|
| 49 |
+
```
|
| 50 |
+
|
| 51 |
+
### Reward Function
|
| 52 |
+
|
| 53 |
+
Simple, clean rewards for stable RL training (binary/shaped hybrid):
|
| 54 |
+
|
| 55 |
+
```
|
| 56 |
+
+ 0.05 per step in pure focus mode
|
| 57 |
+
+ 0.20 Γ temptation for blocking an app proactively
|
| 58 |
+
+ 0.30 for a well-timed break (at session boundary)
|
| 59 |
+
- 0.50 for checking a distracting app (hard penalty)
|
| 60 |
+
- 0.10 for taking a break mid-session
|
| 61 |
+
```
|
| 62 |
+
|
| 63 |
+
### Tasks
|
| 64 |
+
|
| 65 |
+
Three tasks of increasing difficulty:
|
| 66 |
+
|
| 67 |
+
| Task | Goal | Max Steps |
|
| 68 |
+
|---|---|---|
|
| 69 |
+
| `task_1` | Complete 1 session with zero distractions | 60 |
|
| 70 |
+
| `task_2` | Complete 2 sessions with correct break timing | 120 |
|
| 71 |
+
| `task_3` | Block all 5 apps within 10 steps, then complete a session | 80 |
|
| 72 |
+
|
| 73 |
---
|
| 74 |
+
|
| 75 |
+
## OpenEnv API
|
| 76 |
+
|
| 77 |
+
The server exposes the standard OpenEnv HTTP API:
|
| 78 |
+
|
| 79 |
+
```
|
| 80 |
+
POST /reset?task_id=task_1 β FocusObservation
|
| 81 |
+
POST /step (body: FocusAction) β FocusObservation + reward + done
|
| 82 |
+
GET /state β FocusState (full internal state)
|
| 83 |
+
GET /health β {"status": "ok"}
|
| 84 |
+
GET /tasks β list of all tasks
|
| 85 |
+
```
|
| 86 |
+
|
| 87 |
+
### Quick Start (local)
|
| 88 |
+
|
| 89 |
+
```bash
|
| 90 |
+
# Install
|
| 91 |
+
pip install -r requirements.txt
|
| 92 |
+
|
| 93 |
+
# Run server
|
| 94 |
+
uvicorn app:app --host 0.0.0.0 --port 7860 --reload
|
| 95 |
+
|
| 96 |
+
# In another terminal: reset and take a step
|
| 97 |
+
curl -X POST http://localhost:7860/reset?task_id=task_1
|
| 98 |
+
curl -X POST http://localhost:7860/step \
|
| 99 |
+
-H "Content-Type: application/json" \
|
| 100 |
+
-d '{"action_type": "block_app", "app_name": "Instagram", "reasoning": "Block high temptation early"}'
|
| 101 |
+
```
|
| 102 |
+
|
| 103 |
+
### Run the LLM Agent
|
| 104 |
+
|
| 105 |
+
```bash
|
| 106 |
+
export API_BASE_URL=https://api.groq.com/openai/v1
|
| 107 |
+
export MODEL_NAME=llama-3.1-8b-instant
|
| 108 |
+
export HF_TOKEN=your_token_here
|
| 109 |
+
export ENV_BASE_URL=http://localhost:7860
|
| 110 |
+
export TASK_ID=task_1
|
| 111 |
+
|
| 112 |
+
python inference.py
|
| 113 |
+
```
|
| 114 |
+
|
| 115 |
+
### Deploy to HF Spaces
|
| 116 |
+
|
| 117 |
+
```bash
|
| 118 |
+
# Install OpenEnv CLI
|
| 119 |
+
pip install openenv
|
| 120 |
+
|
| 121 |
+
# Push to Hugging Face Spaces
|
| 122 |
+
openenv deploy --space YOUR_HF_USERNAME/focusflow-env
|
| 123 |
+
```
|
| 124 |
+
|
| 125 |
+
---
|
| 126 |
+
|
| 127 |
+
## Project Structure
|
| 128 |
+
|
| 129 |
+
```
|
| 130 |
+
focusflow_rl_env/
|
| 131 |
+
βββ models.py # Pydantic: FocusAction, FocusObservation, FocusState
|
| 132 |
+
βββ environment.py # Core RL logic: step(), reset(), state(), reward
|
| 133 |
+
βββ app.py # FastAPI server exposing OpenEnv HTTP API
|
| 134 |
+
βββ inference.py # LLM baseline agent (Groq/OpenAI compatible)
|
| 135 |
+
βββ Dockerfile # Container for HF Spaces deployment
|
| 136 |
+
βββ requirements.txt
|
| 137 |
+
βββ openenv.yaml # OpenEnv metadata
|
| 138 |
+
βββ README.md
|
| 139 |
+
```
|
| 140 |
+
|
| 141 |
+
---
|
| 142 |
+
|
| 143 |
+
## Why This Problem?
|
| 144 |
+
|
| 145 |
+
Student distraction is one of the most real, measurable problems in the world.
|
| 146 |
+
Phones, social media and short-form video are scientifically proven to reduce
|
| 147 |
+
deep work capacity. An RL agent that learns optimal focus management strategies
|
| 148 |
+
could be embedded in productivity apps, study tools, or OS-level focus modes β
|
| 149 |
+
making it immediately useful beyond the hackathon.
|
| 150 |
+
|
| 151 |
---
|
| 152 |
|
| 153 |
+
## Submitted by
|
| 154 |
+
Abdul Hannan β Meta x Scaler OpenEnv Hackathon 2026
|
app.py
ADDED
|
@@ -0,0 +1,88 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
FocusFlow RL Environment β app.py
|
| 3 |
+
FastAPI server exposing the OpenEnv HTTP API:
|
| 4 |
+
POST /reset
|
| 5 |
+
POST /step
|
| 6 |
+
GET /state
|
| 7 |
+
GET /health
|
| 8 |
+
GET /tasks
|
| 9 |
+
"""
|
| 10 |
+
|
| 11 |
+
from fastapi import FastAPI, HTTPException
|
| 12 |
+
from fastapi.middleware.cors import CORSMiddleware
|
| 13 |
+
from models import FocusAction, FocusObservation, FocusState
|
| 14 |
+
from environment import FocusFlowEnvironment, TASKS
|
| 15 |
+
from typing import Optional
|
| 16 |
+
import uvicorn
|
| 17 |
+
|
| 18 |
+
app = FastAPI(
|
| 19 |
+
title="FocusFlow RL Environment",
|
| 20 |
+
description="OpenEnv-compatible RL environment for student focus & anti-distraction agent training.",
|
| 21 |
+
version="1.0.0",
|
| 22 |
+
)
|
| 23 |
+
|
| 24 |
+
app.add_middleware(
|
| 25 |
+
CORSMiddleware,
|
| 26 |
+
allow_origins=["*"],
|
| 27 |
+
allow_methods=["*"],
|
| 28 |
+
allow_headers=["*"],
|
| 29 |
+
)
|
| 30 |
+
|
| 31 |
+
# One environment per server instance (stateful server pattern as per OpenEnv)
|
| 32 |
+
env: Optional[FocusFlowEnvironment] = None
|
| 33 |
+
|
| 34 |
+
|
| 35 |
+
# βββ Endpoints ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 36 |
+
|
| 37 |
+
@app.get("/health")
|
| 38 |
+
def health():
|
| 39 |
+
return {"status": "ok", "environment": "FocusFlow", "version": "1.0.0"}
|
| 40 |
+
|
| 41 |
+
|
| 42 |
+
@app.get("/tasks")
|
| 43 |
+
def list_tasks():
|
| 44 |
+
"""List all available tasks."""
|
| 45 |
+
return {"tasks": TASKS}
|
| 46 |
+
|
| 47 |
+
|
| 48 |
+
@app.post("/reset", response_model=FocusObservation)
|
| 49 |
+
def reset(task_id: str = "task_1", seed: int = 42):
|
| 50 |
+
"""
|
| 51 |
+
Reset the environment and return initial observation.
|
| 52 |
+
Optionally specify which task to load.
|
| 53 |
+
"""
|
| 54 |
+
global env
|
| 55 |
+
if task_id not in [t["id"] for t in TASKS]:
|
| 56 |
+
raise HTTPException(status_code=400, detail=f"Unknown task_id: {task_id}. Available: {[t['id'] for t in TASKS]}")
|
| 57 |
+
env = FocusFlowEnvironment(task_id=task_id, seed=seed)
|
| 58 |
+
obs = env.reset()
|
| 59 |
+
return obs
|
| 60 |
+
|
| 61 |
+
|
| 62 |
+
class StepResponse(FocusObservation):
|
| 63 |
+
reward: float
|
| 64 |
+
done: bool
|
| 65 |
+
info: dict
|
| 66 |
+
|
| 67 |
+
|
| 68 |
+
@app.post("/step", response_model=StepResponse)
|
| 69 |
+
def step(action: FocusAction):
|
| 70 |
+
"""
|
| 71 |
+
Submit one action and receive the next observation + reward.
|
| 72 |
+
"""
|
| 73 |
+
if env is None:
|
| 74 |
+
raise HTTPException(status_code=400, detail="Environment not initialised. Call /reset first.")
|
| 75 |
+
obs, reward, done, info = env.step(action)
|
| 76 |
+
return StepResponse(**obs.model_dump(), reward=reward, done=done, info=info)
|
| 77 |
+
|
| 78 |
+
|
| 79 |
+
@app.get("/state", response_model=FocusState)
|
| 80 |
+
def state():
|
| 81 |
+
"""Return the full internal environment state."""
|
| 82 |
+
if env is None:
|
| 83 |
+
raise HTTPException(status_code=400, detail="Environment not initialised. Call /reset first.")
|
| 84 |
+
return env.state()
|
| 85 |
+
|
| 86 |
+
|
| 87 |
+
if __name__ == "__main__":
|
| 88 |
+
uvicorn.run("app:app", host="0.0.0.0", port=7860, reload=True)
|
environment.py
ADDED
|
@@ -0,0 +1,281 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
FocusFlow RL Environment β environment.py
|
| 3 |
+
Core logic: tasks, reward shaping, grader, episode management
|
| 4 |
+
"""
|
| 5 |
+
|
| 6 |
+
import random
|
| 7 |
+
from typing import Tuple, List, Optional
|
| 8 |
+
from models import (
|
| 9 |
+
FocusAction, FocusObservation, FocusState,
|
| 10 |
+
DistractingApp, AppCategory
|
| 11 |
+
)
|
| 12 |
+
|
| 13 |
+
|
| 14 |
+
# βββ Configurable tasks βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 15 |
+
|
| 16 |
+
TASKS = [
|
| 17 |
+
{
|
| 18 |
+
"id": "task_1",
|
| 19 |
+
"description": "Complete one 25-minute focus session without checking any distracting app.",
|
| 20 |
+
"success_condition": "sessions_completed >= 1 and len(apps_checked) == 0",
|
| 21 |
+
"max_steps": 60,
|
| 22 |
+
"bonus": "Block at least 3 apps before the session ends for a 0.2 bonus.",
|
| 23 |
+
},
|
| 24 |
+
{
|
| 25 |
+
"id": "task_2",
|
| 26 |
+
"description": "Complete two focus sessions with strategically timed breaks (take_break at the right time).",
|
| 27 |
+
"success_condition": "sessions_completed >= 2 and breaks_taken >= 2",
|
| 28 |
+
"max_steps": 120,
|
| 29 |
+
"bonus": "Never check a distracting app for a full 0.15 bonus.",
|
| 30 |
+
},
|
| 31 |
+
{
|
| 32 |
+
"id": "task_3",
|
| 33 |
+
"description": "Manage a high-distraction environment: block all 5 apps within 10 steps and maintain focus.",
|
| 34 |
+
"success_condition": "len(apps_blocked) >= 5 and sessions_completed >= 1",
|
| 35 |
+
"max_steps": 80,
|
| 36 |
+
"bonus": "Block all apps within first 8 steps for 0.25 bonus.",
|
| 37 |
+
},
|
| 38 |
+
]
|
| 39 |
+
|
| 40 |
+
# βββ Distraction pool βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 41 |
+
|
| 42 |
+
DISTRACTION_POOL: List[DistractingApp] = [
|
| 43 |
+
DistractingApp(name="Instagram", category=AppCategory.social_media, temptation_level=0.85),
|
| 44 |
+
DistractingApp(name="YouTube", category=AppCategory.video, temptation_level=0.90),
|
| 45 |
+
DistractingApp(name="WhatsApp", category=AppCategory.messaging, temptation_level=0.70),
|
| 46 |
+
DistractingApp(name="Twitter", category=AppCategory.social_media, temptation_level=0.75),
|
| 47 |
+
DistractingApp(name="BGMI", category=AppCategory.gaming, temptation_level=0.95),
|
| 48 |
+
DistractingApp(name="Reddit", category=AppCategory.news, temptation_level=0.80),
|
| 49 |
+
DistractingApp(name="Netflix", category=AppCategory.video, temptation_level=0.88),
|
| 50 |
+
DistractingApp(name="Snapchat", category=AppCategory.social_media, temptation_level=0.72),
|
| 51 |
+
]
|
| 52 |
+
|
| 53 |
+
FOCUS_DURATION_SECONDS = 25 * 60 # 25 minutes
|
| 54 |
+
SHORT_BREAK_SECONDS = 5 * 60 # 5 minutes
|
| 55 |
+
LONG_BREAK_SECONDS = 15 * 60 # 15 minutes (every 4 sessions)
|
| 56 |
+
|
| 57 |
+
|
| 58 |
+
class FocusFlowEnvironment:
|
| 59 |
+
"""
|
| 60 |
+
OpenEnv-compatible RL environment for the FocusFlow anti-distraction agent.
|
| 61 |
+
Implements step() / reset() / state() as per OpenEnv spec.
|
| 62 |
+
"""
|
| 63 |
+
|
| 64 |
+
def __init__(self, task_id: str = "task_1", seed: int = 42):
|
| 65 |
+
random.seed(seed)
|
| 66 |
+
self.task = next(t for t in TASKS if t["id"] == task_id)
|
| 67 |
+
self._reset_internal()
|
| 68 |
+
|
| 69 |
+
# ββ Internal helpers ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 70 |
+
|
| 71 |
+
def _reset_internal(self):
|
| 72 |
+
self.step_count = 0
|
| 73 |
+
self.max_steps = self.task["max_steps"]
|
| 74 |
+
self.total_focus_secs = 0
|
| 75 |
+
self.total_distraction_s = 0
|
| 76 |
+
self.sessions_completed = 0
|
| 77 |
+
self.breaks_taken = 0
|
| 78 |
+
self.apps_blocked: List[str] = []
|
| 79 |
+
self.apps_checked: List[str] = []
|
| 80 |
+
self.current_phase = "focus"
|
| 81 |
+
self.time_remaining = FOCUS_DURATION_SECONDS
|
| 82 |
+
self.cumulative_reward = 0.0
|
| 83 |
+
self.done = False
|
| 84 |
+
self.active_distractions = self._sample_distractions(3)
|
| 85 |
+
|
| 86 |
+
def _sample_distractions(self, n: int) -> List[str]:
|
| 87 |
+
"""Pick n random distracting apps not already blocked."""
|
| 88 |
+
available = [d.name for d in DISTRACTION_POOL if d.name not in self.apps_blocked]
|
| 89 |
+
return random.sample(available, min(n, len(available)))
|
| 90 |
+
|
| 91 |
+
def _maybe_spawn_distraction(self) -> Optional[str]:
|
| 92 |
+
"""30% chance each step to surface a new distraction."""
|
| 93 |
+
if random.random() < 0.30:
|
| 94 |
+
available = [
|
| 95 |
+
d.name for d in DISTRACTION_POOL
|
| 96 |
+
if d.name not in self.apps_blocked
|
| 97 |
+
and d.name not in self.active_distractions
|
| 98 |
+
]
|
| 99 |
+
if available:
|
| 100 |
+
new_app = random.choice(available)
|
| 101 |
+
self.active_distractions.append(new_app)
|
| 102 |
+
return new_app
|
| 103 |
+
return None
|
| 104 |
+
|
| 105 |
+
def _compute_reward(self, action: FocusAction) -> Tuple[float, str]:
|
| 106 |
+
"""
|
| 107 |
+
Reward function β clean and interpretable for RL training.
|
| 108 |
+
|
| 109 |
+
Positive rewards:
|
| 110 |
+
+0.5 per completed focus session (no distractions)
|
| 111 |
+
+0.3 for a well-timed voluntary break
|
| 112 |
+
+0.2 for blocking a high-temptation app before being distracted
|
| 113 |
+
+0.05 per step spent in pure focus mode
|
| 114 |
+
|
| 115 |
+
Negative rewards:
|
| 116 |
+
-0.5 for checking a distracting app
|
| 117 |
+
-0.1 for taking a break at the wrong time (mid-session, not at boundary)
|
| 118 |
+
-0.05 per step in focus mode with unblocked high-temptation app active
|
| 119 |
+
"""
|
| 120 |
+
reward = 0.0
|
| 121 |
+
feedback = ""
|
| 122 |
+
|
| 123 |
+
if action.action_type == "focus":
|
| 124 |
+
reward += 0.05
|
| 125 |
+
feedback = "Good. Staying focused adds a small step reward."
|
| 126 |
+
|
| 127 |
+
elif action.action_type == "block_app":
|
| 128 |
+
if action.app_name and action.app_name not in self.apps_blocked:
|
| 129 |
+
app_obj = next((d for d in DISTRACTION_POOL if d.name == action.app_name), None)
|
| 130 |
+
if app_obj:
|
| 131 |
+
self.apps_blocked.append(action.app_name)
|
| 132 |
+
if action.app_name in self.active_distractions:
|
| 133 |
+
self.active_distractions.remove(action.app_name)
|
| 134 |
+
reward += 0.20 * app_obj.temptation_level # scale by how tempting it was
|
| 135 |
+
feedback = f"Blocked {action.app_name}. Reward scaled by temptation level ({app_obj.temptation_level:.2f})."
|
| 136 |
+
else:
|
| 137 |
+
feedback = "App not found in distraction pool β no reward."
|
| 138 |
+
else:
|
| 139 |
+
feedback = "App already blocked or not specified."
|
| 140 |
+
|
| 141 |
+
elif action.action_type == "take_break":
|
| 142 |
+
if self.current_phase == "focus" and self.time_remaining <= 30:
|
| 143 |
+
# Strategic: break at session boundary
|
| 144 |
+
reward += 0.30
|
| 145 |
+
feedback = "Well-timed break at session boundary! +0.30 reward."
|
| 146 |
+
self.current_phase = "break"
|
| 147 |
+
self.time_remaining = SHORT_BREAK_SECONDS if (self.sessions_completed + 1) % 4 != 0 else LONG_BREAK_SECONDS
|
| 148 |
+
self.breaks_taken += 1
|
| 149 |
+
elif self.current_phase == "break":
|
| 150 |
+
feedback = "Already on a break. No reward."
|
| 151 |
+
else:
|
| 152 |
+
reward -= 0.10
|
| 153 |
+
feedback = "Break taken mid-session. -0.10 penalty."
|
| 154 |
+
self.breaks_taken += 1
|
| 155 |
+
|
| 156 |
+
elif action.action_type == "check_app":
|
| 157 |
+
app = action.app_name or (self.active_distractions[0] if self.active_distractions else None)
|
| 158 |
+
if app:
|
| 159 |
+
reward -= 0.50
|
| 160 |
+
feedback = f"Gave in to {app}! Hard penalty: -0.50."
|
| 161 |
+
self.apps_checked.append(app)
|
| 162 |
+
self.total_distraction_s += 60 # assume 1 min lost per check
|
| 163 |
+
else:
|
| 164 |
+
feedback = "No active distraction to check."
|
| 165 |
+
|
| 166 |
+
elif action.action_type == "adjust_timer":
|
| 167 |
+
# Neutral but allows personalisation
|
| 168 |
+
reward += 0.01
|
| 169 |
+
feedback = f"Timer adjusted to {action.timer_minutes} min. Minimal reward."
|
| 170 |
+
|
| 171 |
+
return reward, feedback
|
| 172 |
+
|
| 173 |
+
def _advance_time(self, seconds: int = 60):
|
| 174 |
+
"""Advance simulation by `seconds`. Transitions phase when timer hits 0."""
|
| 175 |
+
self.time_remaining -= seconds
|
| 176 |
+
if self.time_remaining <= 0:
|
| 177 |
+
if self.current_phase == "focus":
|
| 178 |
+
self.sessions_completed += 1
|
| 179 |
+
self.total_focus_secs += FOCUS_DURATION_SECONDS
|
| 180 |
+
# start break
|
| 181 |
+
self.current_phase = "break"
|
| 182 |
+
self.time_remaining = SHORT_BREAK_SECONDS if self.sessions_completed % 4 != 0 else LONG_BREAK_SECONDS
|
| 183 |
+
else:
|
| 184 |
+
# break ended, start new focus session
|
| 185 |
+
self.current_phase = "focus"
|
| 186 |
+
self.time_remaining = FOCUS_DURATION_SECONDS
|
| 187 |
+
self.active_distractions = self._sample_distractions(2)
|
| 188 |
+
|
| 189 |
+
def _check_success(self) -> bool:
|
| 190 |
+
"""Evaluate the task success condition."""
|
| 191 |
+
sessions_completed = self.sessions_completed
|
| 192 |
+
apps_blocked = self.apps_blocked
|
| 193 |
+
apps_checked = self.apps_checked
|
| 194 |
+
breaks_taken = self.breaks_taken
|
| 195 |
+
try:
|
| 196 |
+
return eval(self.task["success_condition"]) # noqa: S307
|
| 197 |
+
except Exception:
|
| 198 |
+
return False
|
| 199 |
+
|
| 200 |
+
# ββ Public OpenEnv API ββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 201 |
+
|
| 202 |
+
def reset(self) -> FocusObservation:
|
| 203 |
+
"""Reset the environment and return the initial observation."""
|
| 204 |
+
self._reset_internal()
|
| 205 |
+
return FocusObservation(
|
| 206 |
+
time_remaining_seconds = self.time_remaining,
|
| 207 |
+
current_phase = self.current_phase,
|
| 208 |
+
active_distractions = list(self.active_distractions),
|
| 209 |
+
blocked_apps = list(self.apps_blocked),
|
| 210 |
+
sessions_completed = self.sessions_completed,
|
| 211 |
+
focus_score = 0.0,
|
| 212 |
+
last_action_feedback = f"Environment reset. Task: {self.task['description']}",
|
| 213 |
+
distraction_event = None,
|
| 214 |
+
)
|
| 215 |
+
|
| 216 |
+
def step(self, action: FocusAction) -> Tuple[FocusObservation, float, bool, dict]:
|
| 217 |
+
"""
|
| 218 |
+
Process one agent action.
|
| 219 |
+
Returns: (observation, reward, done, info)
|
| 220 |
+
"""
|
| 221 |
+
if self.done:
|
| 222 |
+
raise RuntimeError("Episode is done. Call reset() to start a new episode.")
|
| 223 |
+
|
| 224 |
+
self.step_count += 1
|
| 225 |
+
|
| 226 |
+
# Advance simulated time (each step = 1 minute in the student's world)
|
| 227 |
+
self._advance_time(seconds=60)
|
| 228 |
+
|
| 229 |
+
# Compute reward and get feedback
|
| 230 |
+
reward, feedback = self._compute_reward(action)
|
| 231 |
+
|
| 232 |
+
# Maybe spawn a new distraction
|
| 233 |
+
new_distraction = self._maybe_spawn_distraction()
|
| 234 |
+
|
| 235 |
+
# Compute running focus score
|
| 236 |
+
focus_ratio = (
|
| 237 |
+
self.total_focus_secs /
|
| 238 |
+
max(1, self.total_focus_secs + self.total_distraction_s)
|
| 239 |
+
)
|
| 240 |
+
|
| 241 |
+
# Check episode termination
|
| 242 |
+
success = self._check_success()
|
| 243 |
+
self.done = self.step_count >= self.max_steps or success
|
| 244 |
+
|
| 245 |
+
self.cumulative_reward += reward
|
| 246 |
+
|
| 247 |
+
obs = FocusObservation(
|
| 248 |
+
time_remaining_seconds = self.time_remaining,
|
| 249 |
+
current_phase = self.current_phase,
|
| 250 |
+
active_distractions = list(self.active_distractions),
|
| 251 |
+
blocked_apps = list(self.apps_blocked),
|
| 252 |
+
sessions_completed = self.sessions_completed,
|
| 253 |
+
focus_score = round(focus_ratio, 3),
|
| 254 |
+
last_action_feedback = feedback,
|
| 255 |
+
distraction_event = new_distraction,
|
| 256 |
+
)
|
| 257 |
+
|
| 258 |
+
info = {
|
| 259 |
+
"step": self.step_count,
|
| 260 |
+
"success": success,
|
| 261 |
+
"cumulative": round(self.cumulative_reward, 4),
|
| 262 |
+
}
|
| 263 |
+
|
| 264 |
+
return obs, round(reward, 4), self.done, info
|
| 265 |
+
|
| 266 |
+
def state(self) -> FocusState:
|
| 267 |
+
"""Return the full internal state (for debugging / logging)."""
|
| 268 |
+
return FocusState(
|
| 269 |
+
episode_step = self.step_count,
|
| 270 |
+
max_steps = self.max_steps,
|
| 271 |
+
total_focus_seconds = self.total_focus_secs,
|
| 272 |
+
total_distraction_seconds = self.total_distraction_s,
|
| 273 |
+
sessions_completed = self.sessions_completed,
|
| 274 |
+
breaks_taken = self.breaks_taken,
|
| 275 |
+
apps_blocked = list(self.apps_blocked),
|
| 276 |
+
apps_checked = list(self.apps_checked),
|
| 277 |
+
current_phase = self.current_phase,
|
| 278 |
+
time_remaining_seconds = self.time_remaining,
|
| 279 |
+
cumulative_reward = round(self.cumulative_reward, 4),
|
| 280 |
+
done = self.done,
|
| 281 |
+
)
|
inference.py
ADDED
|
@@ -0,0 +1,180 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
FocusFlow RL Environment β inference.py
|
| 3 |
+
HACKATHON SUBMISSION β Meta x Scaler OpenEnv 2026
|
| 4 |
+
|
| 5 |
+
CRITICAL: Logs MUST follow [START] / [STEP] / [END] format exactly.
|
| 6 |
+
Uses OpenAI client as required by the hackathon spec.
|
| 7 |
+
Runtime < 20 min | Runs on vcpu=2, memory=8gb
|
| 8 |
+
"""
|
| 9 |
+
|
| 10 |
+
import os
|
| 11 |
+
import json
|
| 12 |
+
import httpx
|
| 13 |
+
from openai import OpenAI
|
| 14 |
+
|
| 15 |
+
# ββ Env vars (required by hackathon spec) ββββββββββββββββββββββββββββββββββββ
|
| 16 |
+
API_BASE_URL = os.environ.get("API_BASE_URL", "https://api.groq.com/openai/v1")
|
| 17 |
+
MODEL_NAME = os.environ.get("MODEL_NAME", "llama-3.1-8b-instant")
|
| 18 |
+
HF_TOKEN = os.environ.get("HF_TOKEN", "")
|
| 19 |
+
ENV_BASE_URL = os.environ.get("ENV_BASE_URL", "http://localhost:7860")
|
| 20 |
+
MAX_STEPS = int(os.environ.get("MAX_STEPS", "30"))
|
| 21 |
+
|
| 22 |
+
# ββ OpenAI client (REQUIRED by hackathon β do not use httpx for LLM calls) ββ
|
| 23 |
+
llm_client = OpenAI(base_url=API_BASE_URL, api_key=HF_TOKEN)
|
| 24 |
+
|
| 25 |
+
SYSTEM_PROMPT = """You are an AI agent managing a student's focus session.
|
| 26 |
+
|
| 27 |
+
Goal: maximise focus, minimise distractions across the episode.
|
| 28 |
+
|
| 29 |
+
Actions you can take β respond ONLY with valid JSON:
|
| 30 |
+
focus -> stay focused (small step reward)
|
| 31 |
+
block_app -> block a distracting app (include "app_name")
|
| 32 |
+
take_break -> take a voluntary break (reward if timed at session boundary)
|
| 33 |
+
check_app -> give in to distraction (HEAVY -0.50 PENALTY, never do this)
|
| 34 |
+
adjust_timer -> change pomodoro length (include "timer_minutes": int)
|
| 35 |
+
|
| 36 |
+
Response format (JSON only, no markdown fences):
|
| 37 |
+
{
|
| 38 |
+
"action_type": "block_app",
|
| 39 |
+
"app_name": "Instagram",
|
| 40 |
+
"reasoning": "Block high-temptation app early."
|
| 41 |
+
}
|
| 42 |
+
|
| 43 |
+
Strategy:
|
| 44 |
+
1. Block high-temptation apps in the first few steps.
|
| 45 |
+
2. Stay in focus mode to accumulate +0.05 per step.
|
| 46 |
+
3. Take a break only when time_remaining < 60 seconds (session boundary).
|
| 47 |
+
4. NEVER use check_app.
|
| 48 |
+
"""
|
| 49 |
+
|
| 50 |
+
|
| 51 |
+
def call_llm(messages: list) -> dict:
|
| 52 |
+
"""Call LLM via OpenAI client and parse JSON action."""
|
| 53 |
+
response = llm_client.chat.completions.create(
|
| 54 |
+
model=MODEL_NAME,
|
| 55 |
+
messages=messages,
|
| 56 |
+
temperature=0.2,
|
| 57 |
+
max_tokens=200,
|
| 58 |
+
)
|
| 59 |
+
text = response.choices[0].message.content.strip()
|
| 60 |
+
text = text.replace("```json", "").replace("```", "").strip()
|
| 61 |
+
return json.loads(text)
|
| 62 |
+
|
| 63 |
+
|
| 64 |
+
def run_episode(task_id: str, episode_num: int) -> dict:
|
| 65 |
+
"""Run one full episode. Returns episode summary dict."""
|
| 66 |
+
base = ENV_BASE_URL.rstrip("/")
|
| 67 |
+
|
| 68 |
+
# Reset environment
|
| 69 |
+
reset_resp = httpx.post(f"{base}/reset", params={"task_id": task_id}, timeout=30)
|
| 70 |
+
reset_resp.raise_for_status()
|
| 71 |
+
obs = reset_resp.json()
|
| 72 |
+
|
| 73 |
+
# [START] log β REQUIRED format, judges parse this
|
| 74 |
+
print(json.dumps({
|
| 75 |
+
"type": "[START]",
|
| 76 |
+
"episode": episode_num,
|
| 77 |
+
"task_id": task_id,
|
| 78 |
+
"initial_obs": obs,
|
| 79 |
+
}))
|
| 80 |
+
|
| 81 |
+
messages = [{"role": "system", "content": SYSTEM_PROMPT}]
|
| 82 |
+
total_reward = 0.0
|
| 83 |
+
step = 0
|
| 84 |
+
done = False
|
| 85 |
+
last_info = {}
|
| 86 |
+
|
| 87 |
+
while not done and step < MAX_STEPS:
|
| 88 |
+
step += 1
|
| 89 |
+
|
| 90 |
+
user_content = (
|
| 91 |
+
f"Step {step}.\n"
|
| 92 |
+
f"phase={obs['current_phase']} | "
|
| 93 |
+
f"time_remaining={obs['time_remaining_seconds']}s | "
|
| 94 |
+
f"sessions_done={obs['sessions_completed']} | "
|
| 95 |
+
f"focus_score={obs['focus_score']}\n"
|
| 96 |
+
f"active_distractions={obs['active_distractions']}\n"
|
| 97 |
+
f"blocked_apps={obs['blocked_apps']}\n"
|
| 98 |
+
f"last_feedback={obs['last_action_feedback']}\n"
|
| 99 |
+
f"new_distraction={obs.get('distraction_event')}\n"
|
| 100 |
+
"Choose action (JSON only):"
|
| 101 |
+
)
|
| 102 |
+
messages.append({"role": "user", "content": user_content})
|
| 103 |
+
|
| 104 |
+
try:
|
| 105 |
+
action = call_llm(messages)
|
| 106 |
+
except Exception as e:
|
| 107 |
+
action = {"action_type": "focus", "reasoning": f"LLM error: {e}"}
|
| 108 |
+
|
| 109 |
+
messages.append({"role": "assistant", "content": json.dumps(action)})
|
| 110 |
+
|
| 111 |
+
step_resp = httpx.post(f"{base}/step", json=action, timeout=30)
|
| 112 |
+
step_resp.raise_for_status()
|
| 113 |
+
result = step_resp.json()
|
| 114 |
+
|
| 115 |
+
reward = result["reward"]
|
| 116 |
+
done = result["done"]
|
| 117 |
+
last_info = result.get("info", {})
|
| 118 |
+
obs = result
|
| 119 |
+
total_reward += reward
|
| 120 |
+
|
| 121 |
+
# [STEP] log β REQUIRED format, judges parse this
|
| 122 |
+
print(json.dumps({
|
| 123 |
+
"type": "[STEP]",
|
| 124 |
+
"episode": episode_num,
|
| 125 |
+
"step": step,
|
| 126 |
+
"action": action,
|
| 127 |
+
"reward": round(reward, 4),
|
| 128 |
+
"done": done,
|
| 129 |
+
"obs": {
|
| 130 |
+
"phase": obs["current_phase"],
|
| 131 |
+
"time_remaining": obs["time_remaining_seconds"],
|
| 132 |
+
"focus_score": obs["focus_score"],
|
| 133 |
+
"sessions": obs["sessions_completed"],
|
| 134 |
+
"blocked": obs["blocked_apps"],
|
| 135 |
+
"distractions": obs["active_distractions"],
|
| 136 |
+
},
|
| 137 |
+
}))
|
| 138 |
+
|
| 139 |
+
# [END] log β REQUIRED format, judges parse this
|
| 140 |
+
print(json.dumps({
|
| 141 |
+
"type": "[END]",
|
| 142 |
+
"episode": episode_num,
|
| 143 |
+
"task_id": task_id,
|
| 144 |
+
"total_reward": round(total_reward, 4),
|
| 145 |
+
"steps": step,
|
| 146 |
+
"success": last_info.get("success", False),
|
| 147 |
+
}))
|
| 148 |
+
|
| 149 |
+
return {
|
| 150 |
+
"episode": episode_num,
|
| 151 |
+
"task_id": task_id,
|
| 152 |
+
"total_reward": round(total_reward, 4),
|
| 153 |
+
"steps": step,
|
| 154 |
+
"success": last_info.get("success", False),
|
| 155 |
+
}
|
| 156 |
+
|
| 157 |
+
|
| 158 |
+
def main():
|
| 159 |
+
tasks = ["task_1", "task_2", "task_3"]
|
| 160 |
+
results = []
|
| 161 |
+
|
| 162 |
+
for i, task_id in enumerate(tasks, start=1):
|
| 163 |
+
try:
|
| 164 |
+
result = run_episode(task_id=task_id, episode_num=i)
|
| 165 |
+
results.append(result)
|
| 166 |
+
except Exception as e:
|
| 167 |
+
print(json.dumps({"type": "[ERROR]", "episode": i, "error": str(e)}))
|
| 168 |
+
|
| 169 |
+
avg_reward = sum(r["total_reward"] for r in results) / max(len(results), 1)
|
| 170 |
+
success_rate = sum(1 for r in results if r["success"]) / max(len(results), 1)
|
| 171 |
+
print(json.dumps({
|
| 172 |
+
"type": "SUMMARY",
|
| 173 |
+
"avg_reward": round(avg_reward, 4),
|
| 174 |
+
"success_rate": round(success_rate, 4),
|
| 175 |
+
"episodes": results,
|
| 176 |
+
}))
|
| 177 |
+
|
| 178 |
+
|
| 179 |
+
if __name__ == "__main__":
|
| 180 |
+
main()
|
models.py
ADDED
|
@@ -0,0 +1,74 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
FocusFlow RL Environment β models.py
|
| 3 |
+
OpenEnv hackathon submission: Meta x Scaler 2026
|
| 4 |
+
Pydantic models for Action, Observation, State
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
from pydantic import BaseModel, Field
|
| 8 |
+
from typing import Literal, List, Optional
|
| 9 |
+
from enum import Enum
|
| 10 |
+
|
| 11 |
+
|
| 12 |
+
class AppCategory(str, Enum):
|
| 13 |
+
social_media = "social_media"
|
| 14 |
+
video = "video"
|
| 15 |
+
messaging = "messaging"
|
| 16 |
+
gaming = "gaming"
|
| 17 |
+
news = "news"
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
class DistractingApp(BaseModel):
|
| 21 |
+
name: str
|
| 22 |
+
category: AppCategory
|
| 23 |
+
temptation_level: float = Field(..., ge=0.0, le=1.0, description="How tempting (0=low, 1=high)")
|
| 24 |
+
|
| 25 |
+
|
| 26 |
+
# βββ Action βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 27 |
+
|
| 28 |
+
class FocusAction(BaseModel):
|
| 29 |
+
"""
|
| 30 |
+
The agent submits one of these actions each step.
|
| 31 |
+
|
| 32 |
+
action_type options:
|
| 33 |
+
- focus : continue working, no distractions
|
| 34 |
+
- block_app : block a specific distracting app
|
| 35 |
+
- take_break : voluntarily take a break (strategic)
|
| 36 |
+
- check_app : give in to a distraction (penalised)
|
| 37 |
+
- adjust_timer : change the current pomodoro duration
|
| 38 |
+
"""
|
| 39 |
+
action_type: Literal["focus", "block_app", "take_break", "check_app", "adjust_timer"]
|
| 40 |
+
app_name: Optional[str] = Field(None, description="App to block or check (if applicable)")
|
| 41 |
+
timer_minutes: Optional[int] = Field(None, ge=5, le=60, description="New timer duration (adjust_timer only)")
|
| 42 |
+
reasoning: Optional[str] = Field(None, description="Agent's reasoning for this action (used by LLM grader)")
|
| 43 |
+
|
| 44 |
+
|
| 45 |
+
# βββ Observation ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 46 |
+
|
| 47 |
+
class FocusObservation(BaseModel):
|
| 48 |
+
"""What the agent sees after each step."""
|
| 49 |
+
time_remaining_seconds: int = Field(..., description="Seconds left in current session")
|
| 50 |
+
current_phase: Literal["focus", "break"] = Field(..., description="Whether we are in a focus or break phase")
|
| 51 |
+
active_distractions: List[str] = Field(..., description="Apps currently tempting the agent")
|
| 52 |
+
blocked_apps: List[str] = Field(..., description="Apps the agent has blocked so far")
|
| 53 |
+
sessions_completed: int = Field(..., description="Number of completed pomodoro sessions")
|
| 54 |
+
focus_score: float = Field(..., ge=0.0, le=1.0, description="Running focus quality score")
|
| 55 |
+
last_action_feedback: str = Field(..., description="Human-readable feedback on last action")
|
| 56 |
+
distraction_event: Optional[str] = Field(None, description="A new temptation that just appeared, if any")
|
| 57 |
+
|
| 58 |
+
|
| 59 |
+
# βββ State ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 60 |
+
|
| 61 |
+
class FocusState(BaseModel):
|
| 62 |
+
"""Full internal environment state (returned by state() API call)."""
|
| 63 |
+
episode_step: int
|
| 64 |
+
max_steps: int
|
| 65 |
+
total_focus_seconds: int
|
| 66 |
+
total_distraction_seconds: int
|
| 67 |
+
sessions_completed: int
|
| 68 |
+
breaks_taken: int
|
| 69 |
+
apps_blocked: List[str]
|
| 70 |
+
apps_checked: List[str] = Field(default_factory=list, description="Distractions the agent gave in to")
|
| 71 |
+
current_phase: Literal["focus", "break"]
|
| 72 |
+
time_remaining_seconds: int
|
| 73 |
+
cumulative_reward: float
|
| 74 |
+
done: bool
|
openenv.yaml
ADDED
|
@@ -0,0 +1,65 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
name: focusflow-env
|
| 2 |
+
description: >
|
| 3 |
+
An RL environment where an AI agent learns to manage a student's focus session.
|
| 4 |
+
The agent blocks distracting apps, times breaks correctly, and maximises
|
| 5 |
+
deep-focus time using a Pomodoro-style framework.
|
| 6 |
+
Built on Meta's OpenEnv framework for the Meta x Scaler Hackathon 2026.
|
| 7 |
+
|
| 8 |
+
version: "1.0.0"
|
| 9 |
+
author: Abdul Hannan
|
| 10 |
+
license: MIT
|
| 11 |
+
|
| 12 |
+
environment:
|
| 13 |
+
base_url: https://YOUR-HF-SPACE-NAME.hf.space
|
| 14 |
+
framework: openenv
|
| 15 |
+
language: python
|
| 16 |
+
python_version: "3.11"
|
| 17 |
+
|
| 18 |
+
api:
|
| 19 |
+
reset:
|
| 20 |
+
method: POST
|
| 21 |
+
path: /reset
|
| 22 |
+
params:
|
| 23 |
+
- name: task_id
|
| 24 |
+
type: string
|
| 25 |
+
default: task_1
|
| 26 |
+
description: Which task to load (task_1, task_2, task_3)
|
| 27 |
+
- name: seed
|
| 28 |
+
type: integer
|
| 29 |
+
default: 42
|
| 30 |
+
step:
|
| 31 |
+
method: POST
|
| 32 |
+
path: /step
|
| 33 |
+
body: FocusAction
|
| 34 |
+
state:
|
| 35 |
+
method: GET
|
| 36 |
+
path: /state
|
| 37 |
+
|
| 38 |
+
tasks:
|
| 39 |
+
- id: task_1
|
| 40 |
+
description: Complete one 25-min focus session without checking any distracting app.
|
| 41 |
+
max_steps: 60
|
| 42 |
+
success_reward: 1.0
|
| 43 |
+
|
| 44 |
+
- id: task_2
|
| 45 |
+
description: Complete two sessions with strategically timed breaks.
|
| 46 |
+
max_steps: 120
|
| 47 |
+
success_reward: 1.0
|
| 48 |
+
|
| 49 |
+
- id: task_3
|
| 50 |
+
description: Block all 5 distracting apps within 10 steps then complete a session.
|
| 51 |
+
max_steps: 80
|
| 52 |
+
success_reward: 1.0
|
| 53 |
+
|
| 54 |
+
reward_range: [-0.5, 0.5]
|
| 55 |
+
action_space: discrete (5 action types)
|
| 56 |
+
observation_space: structured JSON (FocusObservation)
|
| 57 |
+
|
| 58 |
+
tags:
|
| 59 |
+
- productivity
|
| 60 |
+
- student
|
| 61 |
+
- anti-distraction
|
| 62 |
+
- pomodoro
|
| 63 |
+
- llm-agent
|
| 64 |
+
- openenv
|
| 65 |
+
- meta-hackathon-2026
|
requirements.txt
ADDED
|
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
fastapi==0.111.0
|
| 2 |
+
uvicorn[standard]==0.29.0
|
| 3 |
+
pydantic==2.7.1
|
| 4 |
+
httpx==0.27.0
|
| 5 |
+
python-dotenv==1.0.1
|
| 6 |
+
openai>=1.30.0
|