hannan2859r commited on
Commit
fdd45f1
Β·
verified Β·
1 Parent(s): d9deec1

Upload 8 files

Browse files
Files changed (8) hide show
  1. Dockerfile +16 -0
  2. README.md +151 -8
  3. app.py +88 -0
  4. environment.py +281 -0
  5. inference.py +180 -0
  6. models.py +74 -0
  7. openenv.yaml +65 -0
  8. requirements.txt +6 -0
Dockerfile ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.11-slim
2
+
3
+ WORKDIR /app
4
+
5
+ RUN apt-get update && apt-get install -y --no-install-recommends \
6
+ build-essential \
7
+ && rm -rf /var/lib/apt/lists/*
8
+
9
+ COPY requirements.txt .
10
+ RUN pip install --no-cache-dir -r requirements.txt
11
+
12
+ COPY . .
13
+
14
+ EXPOSE 7860
15
+
16
+ CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]
README.md CHANGED
@@ -1,11 +1,154 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
- title: Focusflow Env
3
- emoji: πŸš€
4
- colorFrom: purple
5
- colorTo: purple
6
- sdk: docker
7
- pinned: false
8
- license: mit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  ---
10
 
11
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
1
+ # FocusFlow RL Environment
2
+ ### Meta x Scaler OpenEnv Hackathon 2026
3
+
4
+ > An RL environment where an AI agent learns to manage a student's focus session β€”
5
+ > blocking distracting apps, timing breaks, and maximising deep-focus time.
6
+
7
+ ---
8
+
9
+ ## What It Is
10
+
11
+ FocusFlow is an **OpenEnv-compatible reinforcement learning environment** built on top of
12
+ Meta's OpenEnv framework. An LLM agent is placed in a student's digital world and must:
13
+
14
+ - **Block** distracting apps (Instagram, YouTube, BGMI, etc.) before they steal focus
15
+ - **Time breaks** correctly using the Pomodoro technique (25 min focus / 5 min break)
16
+ - **Resist** distraction events that spawn randomly during the session
17
+ - **Maximise** the focus score across multiple study sessions
18
+
19
+ The environment simulates a realistic student productivity scenario β€” making it a strong
20
+ candidate for training agents that improve human focus and wellbeing.
21
+
22
+ ---
23
+
24
+ ## Environment Design
25
+
26
+ ### Action Space (5 discrete actions)
27
+
28
+ | Action | Description | Reward |
29
+ |---|---|---|
30
+ | `focus` | Stay focused, do nothing | +0.05 per step |
31
+ | `block_app` | Block a distracting app | +0.20 Γ— temptation_level |
32
+ | `take_break` | Take a voluntary break | +0.30 if timed correctly |
33
+ | `adjust_timer` | Change pomodoro duration | +0.01 |
34
+ | `check_app` | Give in to distraction | **-0.50** |
35
+
36
+ ### Observation Space
37
+
38
+ ```json
39
+ {
40
+ "time_remaining_seconds": 1200,
41
+ "current_phase": "focus",
42
+ "active_distractions": ["Instagram", "YouTube"],
43
+ "blocked_apps": ["BGMI"],
44
+ "sessions_completed": 0,
45
+ "focus_score": 0.85,
46
+ "last_action_feedback": "Blocked BGMI. Reward scaled by temptation level (0.95).",
47
+ "distraction_event": "Reddit"
48
+ }
49
+ ```
50
+
51
+ ### Reward Function
52
+
53
+ Simple, clean rewards for stable RL training (binary/shaped hybrid):
54
+
55
+ ```
56
+ + 0.05 per step in pure focus mode
57
+ + 0.20 Γ— temptation for blocking an app proactively
58
+ + 0.30 for a well-timed break (at session boundary)
59
+ - 0.50 for checking a distracting app (hard penalty)
60
+ - 0.10 for taking a break mid-session
61
+ ```
62
+
63
+ ### Tasks
64
+
65
+ Three tasks of increasing difficulty:
66
+
67
+ | Task | Goal | Max Steps |
68
+ |---|---|---|
69
+ | `task_1` | Complete 1 session with zero distractions | 60 |
70
+ | `task_2` | Complete 2 sessions with correct break timing | 120 |
71
+ | `task_3` | Block all 5 apps within 10 steps, then complete a session | 80 |
72
+
73
  ---
74
+
75
+ ## OpenEnv API
76
+
77
+ The server exposes the standard OpenEnv HTTP API:
78
+
79
+ ```
80
+ POST /reset?task_id=task_1 β†’ FocusObservation
81
+ POST /step (body: FocusAction) β†’ FocusObservation + reward + done
82
+ GET /state β†’ FocusState (full internal state)
83
+ GET /health β†’ {"status": "ok"}
84
+ GET /tasks β†’ list of all tasks
85
+ ```
86
+
87
+ ### Quick Start (local)
88
+
89
+ ```bash
90
+ # Install
91
+ pip install -r requirements.txt
92
+
93
+ # Run server
94
+ uvicorn app:app --host 0.0.0.0 --port 7860 --reload
95
+
96
+ # In another terminal: reset and take a step
97
+ curl -X POST http://localhost:7860/reset?task_id=task_1
98
+ curl -X POST http://localhost:7860/step \
99
+ -H "Content-Type: application/json" \
100
+ -d '{"action_type": "block_app", "app_name": "Instagram", "reasoning": "Block high temptation early"}'
101
+ ```
102
+
103
+ ### Run the LLM Agent
104
+
105
+ ```bash
106
+ export API_BASE_URL=https://api.groq.com/openai/v1
107
+ export MODEL_NAME=llama-3.1-8b-instant
108
+ export HF_TOKEN=your_token_here
109
+ export ENV_BASE_URL=http://localhost:7860
110
+ export TASK_ID=task_1
111
+
112
+ python inference.py
113
+ ```
114
+
115
+ ### Deploy to HF Spaces
116
+
117
+ ```bash
118
+ # Install OpenEnv CLI
119
+ pip install openenv
120
+
121
+ # Push to Hugging Face Spaces
122
+ openenv deploy --space YOUR_HF_USERNAME/focusflow-env
123
+ ```
124
+
125
+ ---
126
+
127
+ ## Project Structure
128
+
129
+ ```
130
+ focusflow_rl_env/
131
+ β”œβ”€β”€ models.py # Pydantic: FocusAction, FocusObservation, FocusState
132
+ β”œβ”€β”€ environment.py # Core RL logic: step(), reset(), state(), reward
133
+ β”œβ”€β”€ app.py # FastAPI server exposing OpenEnv HTTP API
134
+ β”œβ”€β”€ inference.py # LLM baseline agent (Groq/OpenAI compatible)
135
+ β”œβ”€β”€ Dockerfile # Container for HF Spaces deployment
136
+ β”œβ”€β”€ requirements.txt
137
+ β”œβ”€β”€ openenv.yaml # OpenEnv metadata
138
+ └── README.md
139
+ ```
140
+
141
+ ---
142
+
143
+ ## Why This Problem?
144
+
145
+ Student distraction is one of the most real, measurable problems in the world.
146
+ Phones, social media and short-form video are scientifically proven to reduce
147
+ deep work capacity. An RL agent that learns optimal focus management strategies
148
+ could be embedded in productivity apps, study tools, or OS-level focus modes β€”
149
+ making it immediately useful beyond the hackathon.
150
+
151
  ---
152
 
153
+ ## Submitted by
154
+ Abdul Hannan β€” Meta x Scaler OpenEnv Hackathon 2026
app.py ADDED
@@ -0,0 +1,88 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ FocusFlow RL Environment β€” app.py
3
+ FastAPI server exposing the OpenEnv HTTP API:
4
+ POST /reset
5
+ POST /step
6
+ GET /state
7
+ GET /health
8
+ GET /tasks
9
+ """
10
+
11
+ from fastapi import FastAPI, HTTPException
12
+ from fastapi.middleware.cors import CORSMiddleware
13
+ from models import FocusAction, FocusObservation, FocusState
14
+ from environment import FocusFlowEnvironment, TASKS
15
+ from typing import Optional
16
+ import uvicorn
17
+
18
+ app = FastAPI(
19
+ title="FocusFlow RL Environment",
20
+ description="OpenEnv-compatible RL environment for student focus & anti-distraction agent training.",
21
+ version="1.0.0",
22
+ )
23
+
24
+ app.add_middleware(
25
+ CORSMiddleware,
26
+ allow_origins=["*"],
27
+ allow_methods=["*"],
28
+ allow_headers=["*"],
29
+ )
30
+
31
+ # One environment per server instance (stateful server pattern as per OpenEnv)
32
+ env: Optional[FocusFlowEnvironment] = None
33
+
34
+
35
+ # ─── Endpoints ────────────────────────────────────────────────────────────────
36
+
37
+ @app.get("/health")
38
+ def health():
39
+ return {"status": "ok", "environment": "FocusFlow", "version": "1.0.0"}
40
+
41
+
42
+ @app.get("/tasks")
43
+ def list_tasks():
44
+ """List all available tasks."""
45
+ return {"tasks": TASKS}
46
+
47
+
48
+ @app.post("/reset", response_model=FocusObservation)
49
+ def reset(task_id: str = "task_1", seed: int = 42):
50
+ """
51
+ Reset the environment and return initial observation.
52
+ Optionally specify which task to load.
53
+ """
54
+ global env
55
+ if task_id not in [t["id"] for t in TASKS]:
56
+ raise HTTPException(status_code=400, detail=f"Unknown task_id: {task_id}. Available: {[t['id'] for t in TASKS]}")
57
+ env = FocusFlowEnvironment(task_id=task_id, seed=seed)
58
+ obs = env.reset()
59
+ return obs
60
+
61
+
62
+ class StepResponse(FocusObservation):
63
+ reward: float
64
+ done: bool
65
+ info: dict
66
+
67
+
68
+ @app.post("/step", response_model=StepResponse)
69
+ def step(action: FocusAction):
70
+ """
71
+ Submit one action and receive the next observation + reward.
72
+ """
73
+ if env is None:
74
+ raise HTTPException(status_code=400, detail="Environment not initialised. Call /reset first.")
75
+ obs, reward, done, info = env.step(action)
76
+ return StepResponse(**obs.model_dump(), reward=reward, done=done, info=info)
77
+
78
+
79
+ @app.get("/state", response_model=FocusState)
80
+ def state():
81
+ """Return the full internal environment state."""
82
+ if env is None:
83
+ raise HTTPException(status_code=400, detail="Environment not initialised. Call /reset first.")
84
+ return env.state()
85
+
86
+
87
+ if __name__ == "__main__":
88
+ uvicorn.run("app:app", host="0.0.0.0", port=7860, reload=True)
environment.py ADDED
@@ -0,0 +1,281 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ FocusFlow RL Environment β€” environment.py
3
+ Core logic: tasks, reward shaping, grader, episode management
4
+ """
5
+
6
+ import random
7
+ from typing import Tuple, List, Optional
8
+ from models import (
9
+ FocusAction, FocusObservation, FocusState,
10
+ DistractingApp, AppCategory
11
+ )
12
+
13
+
14
+ # ─── Configurable tasks ───────────────────────────────────────────────────────
15
+
16
+ TASKS = [
17
+ {
18
+ "id": "task_1",
19
+ "description": "Complete one 25-minute focus session without checking any distracting app.",
20
+ "success_condition": "sessions_completed >= 1 and len(apps_checked) == 0",
21
+ "max_steps": 60,
22
+ "bonus": "Block at least 3 apps before the session ends for a 0.2 bonus.",
23
+ },
24
+ {
25
+ "id": "task_2",
26
+ "description": "Complete two focus sessions with strategically timed breaks (take_break at the right time).",
27
+ "success_condition": "sessions_completed >= 2 and breaks_taken >= 2",
28
+ "max_steps": 120,
29
+ "bonus": "Never check a distracting app for a full 0.15 bonus.",
30
+ },
31
+ {
32
+ "id": "task_3",
33
+ "description": "Manage a high-distraction environment: block all 5 apps within 10 steps and maintain focus.",
34
+ "success_condition": "len(apps_blocked) >= 5 and sessions_completed >= 1",
35
+ "max_steps": 80,
36
+ "bonus": "Block all apps within first 8 steps for 0.25 bonus.",
37
+ },
38
+ ]
39
+
40
+ # ─── Distraction pool ─────────────────────────────────────────────────────────
41
+
42
+ DISTRACTION_POOL: List[DistractingApp] = [
43
+ DistractingApp(name="Instagram", category=AppCategory.social_media, temptation_level=0.85),
44
+ DistractingApp(name="YouTube", category=AppCategory.video, temptation_level=0.90),
45
+ DistractingApp(name="WhatsApp", category=AppCategory.messaging, temptation_level=0.70),
46
+ DistractingApp(name="Twitter", category=AppCategory.social_media, temptation_level=0.75),
47
+ DistractingApp(name="BGMI", category=AppCategory.gaming, temptation_level=0.95),
48
+ DistractingApp(name="Reddit", category=AppCategory.news, temptation_level=0.80),
49
+ DistractingApp(name="Netflix", category=AppCategory.video, temptation_level=0.88),
50
+ DistractingApp(name="Snapchat", category=AppCategory.social_media, temptation_level=0.72),
51
+ ]
52
+
53
+ FOCUS_DURATION_SECONDS = 25 * 60 # 25 minutes
54
+ SHORT_BREAK_SECONDS = 5 * 60 # 5 minutes
55
+ LONG_BREAK_SECONDS = 15 * 60 # 15 minutes (every 4 sessions)
56
+
57
+
58
+ class FocusFlowEnvironment:
59
+ """
60
+ OpenEnv-compatible RL environment for the FocusFlow anti-distraction agent.
61
+ Implements step() / reset() / state() as per OpenEnv spec.
62
+ """
63
+
64
+ def __init__(self, task_id: str = "task_1", seed: int = 42):
65
+ random.seed(seed)
66
+ self.task = next(t for t in TASKS if t["id"] == task_id)
67
+ self._reset_internal()
68
+
69
+ # ── Internal helpers ──────────────────────────────────────────────────────
70
+
71
+ def _reset_internal(self):
72
+ self.step_count = 0
73
+ self.max_steps = self.task["max_steps"]
74
+ self.total_focus_secs = 0
75
+ self.total_distraction_s = 0
76
+ self.sessions_completed = 0
77
+ self.breaks_taken = 0
78
+ self.apps_blocked: List[str] = []
79
+ self.apps_checked: List[str] = []
80
+ self.current_phase = "focus"
81
+ self.time_remaining = FOCUS_DURATION_SECONDS
82
+ self.cumulative_reward = 0.0
83
+ self.done = False
84
+ self.active_distractions = self._sample_distractions(3)
85
+
86
+ def _sample_distractions(self, n: int) -> List[str]:
87
+ """Pick n random distracting apps not already blocked."""
88
+ available = [d.name for d in DISTRACTION_POOL if d.name not in self.apps_blocked]
89
+ return random.sample(available, min(n, len(available)))
90
+
91
+ def _maybe_spawn_distraction(self) -> Optional[str]:
92
+ """30% chance each step to surface a new distraction."""
93
+ if random.random() < 0.30:
94
+ available = [
95
+ d.name for d in DISTRACTION_POOL
96
+ if d.name not in self.apps_blocked
97
+ and d.name not in self.active_distractions
98
+ ]
99
+ if available:
100
+ new_app = random.choice(available)
101
+ self.active_distractions.append(new_app)
102
+ return new_app
103
+ return None
104
+
105
+ def _compute_reward(self, action: FocusAction) -> Tuple[float, str]:
106
+ """
107
+ Reward function β€” clean and interpretable for RL training.
108
+
109
+ Positive rewards:
110
+ +0.5 per completed focus session (no distractions)
111
+ +0.3 for a well-timed voluntary break
112
+ +0.2 for blocking a high-temptation app before being distracted
113
+ +0.05 per step spent in pure focus mode
114
+
115
+ Negative rewards:
116
+ -0.5 for checking a distracting app
117
+ -0.1 for taking a break at the wrong time (mid-session, not at boundary)
118
+ -0.05 per step in focus mode with unblocked high-temptation app active
119
+ """
120
+ reward = 0.0
121
+ feedback = ""
122
+
123
+ if action.action_type == "focus":
124
+ reward += 0.05
125
+ feedback = "Good. Staying focused adds a small step reward."
126
+
127
+ elif action.action_type == "block_app":
128
+ if action.app_name and action.app_name not in self.apps_blocked:
129
+ app_obj = next((d for d in DISTRACTION_POOL if d.name == action.app_name), None)
130
+ if app_obj:
131
+ self.apps_blocked.append(action.app_name)
132
+ if action.app_name in self.active_distractions:
133
+ self.active_distractions.remove(action.app_name)
134
+ reward += 0.20 * app_obj.temptation_level # scale by how tempting it was
135
+ feedback = f"Blocked {action.app_name}. Reward scaled by temptation level ({app_obj.temptation_level:.2f})."
136
+ else:
137
+ feedback = "App not found in distraction pool β€” no reward."
138
+ else:
139
+ feedback = "App already blocked or not specified."
140
+
141
+ elif action.action_type == "take_break":
142
+ if self.current_phase == "focus" and self.time_remaining <= 30:
143
+ # Strategic: break at session boundary
144
+ reward += 0.30
145
+ feedback = "Well-timed break at session boundary! +0.30 reward."
146
+ self.current_phase = "break"
147
+ self.time_remaining = SHORT_BREAK_SECONDS if (self.sessions_completed + 1) % 4 != 0 else LONG_BREAK_SECONDS
148
+ self.breaks_taken += 1
149
+ elif self.current_phase == "break":
150
+ feedback = "Already on a break. No reward."
151
+ else:
152
+ reward -= 0.10
153
+ feedback = "Break taken mid-session. -0.10 penalty."
154
+ self.breaks_taken += 1
155
+
156
+ elif action.action_type == "check_app":
157
+ app = action.app_name or (self.active_distractions[0] if self.active_distractions else None)
158
+ if app:
159
+ reward -= 0.50
160
+ feedback = f"Gave in to {app}! Hard penalty: -0.50."
161
+ self.apps_checked.append(app)
162
+ self.total_distraction_s += 60 # assume 1 min lost per check
163
+ else:
164
+ feedback = "No active distraction to check."
165
+
166
+ elif action.action_type == "adjust_timer":
167
+ # Neutral but allows personalisation
168
+ reward += 0.01
169
+ feedback = f"Timer adjusted to {action.timer_minutes} min. Minimal reward."
170
+
171
+ return reward, feedback
172
+
173
+ def _advance_time(self, seconds: int = 60):
174
+ """Advance simulation by `seconds`. Transitions phase when timer hits 0."""
175
+ self.time_remaining -= seconds
176
+ if self.time_remaining <= 0:
177
+ if self.current_phase == "focus":
178
+ self.sessions_completed += 1
179
+ self.total_focus_secs += FOCUS_DURATION_SECONDS
180
+ # start break
181
+ self.current_phase = "break"
182
+ self.time_remaining = SHORT_BREAK_SECONDS if self.sessions_completed % 4 != 0 else LONG_BREAK_SECONDS
183
+ else:
184
+ # break ended, start new focus session
185
+ self.current_phase = "focus"
186
+ self.time_remaining = FOCUS_DURATION_SECONDS
187
+ self.active_distractions = self._sample_distractions(2)
188
+
189
+ def _check_success(self) -> bool:
190
+ """Evaluate the task success condition."""
191
+ sessions_completed = self.sessions_completed
192
+ apps_blocked = self.apps_blocked
193
+ apps_checked = self.apps_checked
194
+ breaks_taken = self.breaks_taken
195
+ try:
196
+ return eval(self.task["success_condition"]) # noqa: S307
197
+ except Exception:
198
+ return False
199
+
200
+ # ── Public OpenEnv API ────────────────────────────────────────────────────
201
+
202
+ def reset(self) -> FocusObservation:
203
+ """Reset the environment and return the initial observation."""
204
+ self._reset_internal()
205
+ return FocusObservation(
206
+ time_remaining_seconds = self.time_remaining,
207
+ current_phase = self.current_phase,
208
+ active_distractions = list(self.active_distractions),
209
+ blocked_apps = list(self.apps_blocked),
210
+ sessions_completed = self.sessions_completed,
211
+ focus_score = 0.0,
212
+ last_action_feedback = f"Environment reset. Task: {self.task['description']}",
213
+ distraction_event = None,
214
+ )
215
+
216
+ def step(self, action: FocusAction) -> Tuple[FocusObservation, float, bool, dict]:
217
+ """
218
+ Process one agent action.
219
+ Returns: (observation, reward, done, info)
220
+ """
221
+ if self.done:
222
+ raise RuntimeError("Episode is done. Call reset() to start a new episode.")
223
+
224
+ self.step_count += 1
225
+
226
+ # Advance simulated time (each step = 1 minute in the student's world)
227
+ self._advance_time(seconds=60)
228
+
229
+ # Compute reward and get feedback
230
+ reward, feedback = self._compute_reward(action)
231
+
232
+ # Maybe spawn a new distraction
233
+ new_distraction = self._maybe_spawn_distraction()
234
+
235
+ # Compute running focus score
236
+ focus_ratio = (
237
+ self.total_focus_secs /
238
+ max(1, self.total_focus_secs + self.total_distraction_s)
239
+ )
240
+
241
+ # Check episode termination
242
+ success = self._check_success()
243
+ self.done = self.step_count >= self.max_steps or success
244
+
245
+ self.cumulative_reward += reward
246
+
247
+ obs = FocusObservation(
248
+ time_remaining_seconds = self.time_remaining,
249
+ current_phase = self.current_phase,
250
+ active_distractions = list(self.active_distractions),
251
+ blocked_apps = list(self.apps_blocked),
252
+ sessions_completed = self.sessions_completed,
253
+ focus_score = round(focus_ratio, 3),
254
+ last_action_feedback = feedback,
255
+ distraction_event = new_distraction,
256
+ )
257
+
258
+ info = {
259
+ "step": self.step_count,
260
+ "success": success,
261
+ "cumulative": round(self.cumulative_reward, 4),
262
+ }
263
+
264
+ return obs, round(reward, 4), self.done, info
265
+
266
+ def state(self) -> FocusState:
267
+ """Return the full internal state (for debugging / logging)."""
268
+ return FocusState(
269
+ episode_step = self.step_count,
270
+ max_steps = self.max_steps,
271
+ total_focus_seconds = self.total_focus_secs,
272
+ total_distraction_seconds = self.total_distraction_s,
273
+ sessions_completed = self.sessions_completed,
274
+ breaks_taken = self.breaks_taken,
275
+ apps_blocked = list(self.apps_blocked),
276
+ apps_checked = list(self.apps_checked),
277
+ current_phase = self.current_phase,
278
+ time_remaining_seconds = self.time_remaining,
279
+ cumulative_reward = round(self.cumulative_reward, 4),
280
+ done = self.done,
281
+ )
inference.py ADDED
@@ -0,0 +1,180 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ FocusFlow RL Environment β€” inference.py
3
+ HACKATHON SUBMISSION β€” Meta x Scaler OpenEnv 2026
4
+
5
+ CRITICAL: Logs MUST follow [START] / [STEP] / [END] format exactly.
6
+ Uses OpenAI client as required by the hackathon spec.
7
+ Runtime < 20 min | Runs on vcpu=2, memory=8gb
8
+ """
9
+
10
+ import os
11
+ import json
12
+ import httpx
13
+ from openai import OpenAI
14
+
15
+ # ── Env vars (required by hackathon spec) ────────────────────────────────────
16
+ API_BASE_URL = os.environ.get("API_BASE_URL", "https://api.groq.com/openai/v1")
17
+ MODEL_NAME = os.environ.get("MODEL_NAME", "llama-3.1-8b-instant")
18
+ HF_TOKEN = os.environ.get("HF_TOKEN", "")
19
+ ENV_BASE_URL = os.environ.get("ENV_BASE_URL", "http://localhost:7860")
20
+ MAX_STEPS = int(os.environ.get("MAX_STEPS", "30"))
21
+
22
+ # ── OpenAI client (REQUIRED by hackathon β€” do not use httpx for LLM calls) ──
23
+ llm_client = OpenAI(base_url=API_BASE_URL, api_key=HF_TOKEN)
24
+
25
+ SYSTEM_PROMPT = """You are an AI agent managing a student's focus session.
26
+
27
+ Goal: maximise focus, minimise distractions across the episode.
28
+
29
+ Actions you can take β€” respond ONLY with valid JSON:
30
+ focus -> stay focused (small step reward)
31
+ block_app -> block a distracting app (include "app_name")
32
+ take_break -> take a voluntary break (reward if timed at session boundary)
33
+ check_app -> give in to distraction (HEAVY -0.50 PENALTY, never do this)
34
+ adjust_timer -> change pomodoro length (include "timer_minutes": int)
35
+
36
+ Response format (JSON only, no markdown fences):
37
+ {
38
+ "action_type": "block_app",
39
+ "app_name": "Instagram",
40
+ "reasoning": "Block high-temptation app early."
41
+ }
42
+
43
+ Strategy:
44
+ 1. Block high-temptation apps in the first few steps.
45
+ 2. Stay in focus mode to accumulate +0.05 per step.
46
+ 3. Take a break only when time_remaining < 60 seconds (session boundary).
47
+ 4. NEVER use check_app.
48
+ """
49
+
50
+
51
+ def call_llm(messages: list) -> dict:
52
+ """Call LLM via OpenAI client and parse JSON action."""
53
+ response = llm_client.chat.completions.create(
54
+ model=MODEL_NAME,
55
+ messages=messages,
56
+ temperature=0.2,
57
+ max_tokens=200,
58
+ )
59
+ text = response.choices[0].message.content.strip()
60
+ text = text.replace("```json", "").replace("```", "").strip()
61
+ return json.loads(text)
62
+
63
+
64
+ def run_episode(task_id: str, episode_num: int) -> dict:
65
+ """Run one full episode. Returns episode summary dict."""
66
+ base = ENV_BASE_URL.rstrip("/")
67
+
68
+ # Reset environment
69
+ reset_resp = httpx.post(f"{base}/reset", params={"task_id": task_id}, timeout=30)
70
+ reset_resp.raise_for_status()
71
+ obs = reset_resp.json()
72
+
73
+ # [START] log β€” REQUIRED format, judges parse this
74
+ print(json.dumps({
75
+ "type": "[START]",
76
+ "episode": episode_num,
77
+ "task_id": task_id,
78
+ "initial_obs": obs,
79
+ }))
80
+
81
+ messages = [{"role": "system", "content": SYSTEM_PROMPT}]
82
+ total_reward = 0.0
83
+ step = 0
84
+ done = False
85
+ last_info = {}
86
+
87
+ while not done and step < MAX_STEPS:
88
+ step += 1
89
+
90
+ user_content = (
91
+ f"Step {step}.\n"
92
+ f"phase={obs['current_phase']} | "
93
+ f"time_remaining={obs['time_remaining_seconds']}s | "
94
+ f"sessions_done={obs['sessions_completed']} | "
95
+ f"focus_score={obs['focus_score']}\n"
96
+ f"active_distractions={obs['active_distractions']}\n"
97
+ f"blocked_apps={obs['blocked_apps']}\n"
98
+ f"last_feedback={obs['last_action_feedback']}\n"
99
+ f"new_distraction={obs.get('distraction_event')}\n"
100
+ "Choose action (JSON only):"
101
+ )
102
+ messages.append({"role": "user", "content": user_content})
103
+
104
+ try:
105
+ action = call_llm(messages)
106
+ except Exception as e:
107
+ action = {"action_type": "focus", "reasoning": f"LLM error: {e}"}
108
+
109
+ messages.append({"role": "assistant", "content": json.dumps(action)})
110
+
111
+ step_resp = httpx.post(f"{base}/step", json=action, timeout=30)
112
+ step_resp.raise_for_status()
113
+ result = step_resp.json()
114
+
115
+ reward = result["reward"]
116
+ done = result["done"]
117
+ last_info = result.get("info", {})
118
+ obs = result
119
+ total_reward += reward
120
+
121
+ # [STEP] log β€” REQUIRED format, judges parse this
122
+ print(json.dumps({
123
+ "type": "[STEP]",
124
+ "episode": episode_num,
125
+ "step": step,
126
+ "action": action,
127
+ "reward": round(reward, 4),
128
+ "done": done,
129
+ "obs": {
130
+ "phase": obs["current_phase"],
131
+ "time_remaining": obs["time_remaining_seconds"],
132
+ "focus_score": obs["focus_score"],
133
+ "sessions": obs["sessions_completed"],
134
+ "blocked": obs["blocked_apps"],
135
+ "distractions": obs["active_distractions"],
136
+ },
137
+ }))
138
+
139
+ # [END] log β€” REQUIRED format, judges parse this
140
+ print(json.dumps({
141
+ "type": "[END]",
142
+ "episode": episode_num,
143
+ "task_id": task_id,
144
+ "total_reward": round(total_reward, 4),
145
+ "steps": step,
146
+ "success": last_info.get("success", False),
147
+ }))
148
+
149
+ return {
150
+ "episode": episode_num,
151
+ "task_id": task_id,
152
+ "total_reward": round(total_reward, 4),
153
+ "steps": step,
154
+ "success": last_info.get("success", False),
155
+ }
156
+
157
+
158
+ def main():
159
+ tasks = ["task_1", "task_2", "task_3"]
160
+ results = []
161
+
162
+ for i, task_id in enumerate(tasks, start=1):
163
+ try:
164
+ result = run_episode(task_id=task_id, episode_num=i)
165
+ results.append(result)
166
+ except Exception as e:
167
+ print(json.dumps({"type": "[ERROR]", "episode": i, "error": str(e)}))
168
+
169
+ avg_reward = sum(r["total_reward"] for r in results) / max(len(results), 1)
170
+ success_rate = sum(1 for r in results if r["success"]) / max(len(results), 1)
171
+ print(json.dumps({
172
+ "type": "SUMMARY",
173
+ "avg_reward": round(avg_reward, 4),
174
+ "success_rate": round(success_rate, 4),
175
+ "episodes": results,
176
+ }))
177
+
178
+
179
+ if __name__ == "__main__":
180
+ main()
models.py ADDED
@@ -0,0 +1,74 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ FocusFlow RL Environment β€” models.py
3
+ OpenEnv hackathon submission: Meta x Scaler 2026
4
+ Pydantic models for Action, Observation, State
5
+ """
6
+
7
+ from pydantic import BaseModel, Field
8
+ from typing import Literal, List, Optional
9
+ from enum import Enum
10
+
11
+
12
+ class AppCategory(str, Enum):
13
+ social_media = "social_media"
14
+ video = "video"
15
+ messaging = "messaging"
16
+ gaming = "gaming"
17
+ news = "news"
18
+
19
+
20
+ class DistractingApp(BaseModel):
21
+ name: str
22
+ category: AppCategory
23
+ temptation_level: float = Field(..., ge=0.0, le=1.0, description="How tempting (0=low, 1=high)")
24
+
25
+
26
+ # ─── Action ───────────────────────────────────────────────────────────────────
27
+
28
+ class FocusAction(BaseModel):
29
+ """
30
+ The agent submits one of these actions each step.
31
+
32
+ action_type options:
33
+ - focus : continue working, no distractions
34
+ - block_app : block a specific distracting app
35
+ - take_break : voluntarily take a break (strategic)
36
+ - check_app : give in to a distraction (penalised)
37
+ - adjust_timer : change the current pomodoro duration
38
+ """
39
+ action_type: Literal["focus", "block_app", "take_break", "check_app", "adjust_timer"]
40
+ app_name: Optional[str] = Field(None, description="App to block or check (if applicable)")
41
+ timer_minutes: Optional[int] = Field(None, ge=5, le=60, description="New timer duration (adjust_timer only)")
42
+ reasoning: Optional[str] = Field(None, description="Agent's reasoning for this action (used by LLM grader)")
43
+
44
+
45
+ # ─── Observation ──────────────────────────────────────────────────────────────
46
+
47
+ class FocusObservation(BaseModel):
48
+ """What the agent sees after each step."""
49
+ time_remaining_seconds: int = Field(..., description="Seconds left in current session")
50
+ current_phase: Literal["focus", "break"] = Field(..., description="Whether we are in a focus or break phase")
51
+ active_distractions: List[str] = Field(..., description="Apps currently tempting the agent")
52
+ blocked_apps: List[str] = Field(..., description="Apps the agent has blocked so far")
53
+ sessions_completed: int = Field(..., description="Number of completed pomodoro sessions")
54
+ focus_score: float = Field(..., ge=0.0, le=1.0, description="Running focus quality score")
55
+ last_action_feedback: str = Field(..., description="Human-readable feedback on last action")
56
+ distraction_event: Optional[str] = Field(None, description="A new temptation that just appeared, if any")
57
+
58
+
59
+ # ─── State ────────────────────────────────────────────────────────────────────
60
+
61
+ class FocusState(BaseModel):
62
+ """Full internal environment state (returned by state() API call)."""
63
+ episode_step: int
64
+ max_steps: int
65
+ total_focus_seconds: int
66
+ total_distraction_seconds: int
67
+ sessions_completed: int
68
+ breaks_taken: int
69
+ apps_blocked: List[str]
70
+ apps_checked: List[str] = Field(default_factory=list, description="Distractions the agent gave in to")
71
+ current_phase: Literal["focus", "break"]
72
+ time_remaining_seconds: int
73
+ cumulative_reward: float
74
+ done: bool
openenv.yaml ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: focusflow-env
2
+ description: >
3
+ An RL environment where an AI agent learns to manage a student's focus session.
4
+ The agent blocks distracting apps, times breaks correctly, and maximises
5
+ deep-focus time using a Pomodoro-style framework.
6
+ Built on Meta's OpenEnv framework for the Meta x Scaler Hackathon 2026.
7
+
8
+ version: "1.0.0"
9
+ author: Abdul Hannan
10
+ license: MIT
11
+
12
+ environment:
13
+ base_url: https://YOUR-HF-SPACE-NAME.hf.space
14
+ framework: openenv
15
+ language: python
16
+ python_version: "3.11"
17
+
18
+ api:
19
+ reset:
20
+ method: POST
21
+ path: /reset
22
+ params:
23
+ - name: task_id
24
+ type: string
25
+ default: task_1
26
+ description: Which task to load (task_1, task_2, task_3)
27
+ - name: seed
28
+ type: integer
29
+ default: 42
30
+ step:
31
+ method: POST
32
+ path: /step
33
+ body: FocusAction
34
+ state:
35
+ method: GET
36
+ path: /state
37
+
38
+ tasks:
39
+ - id: task_1
40
+ description: Complete one 25-min focus session without checking any distracting app.
41
+ max_steps: 60
42
+ success_reward: 1.0
43
+
44
+ - id: task_2
45
+ description: Complete two sessions with strategically timed breaks.
46
+ max_steps: 120
47
+ success_reward: 1.0
48
+
49
+ - id: task_3
50
+ description: Block all 5 distracting apps within 10 steps then complete a session.
51
+ max_steps: 80
52
+ success_reward: 1.0
53
+
54
+ reward_range: [-0.5, 0.5]
55
+ action_space: discrete (5 action types)
56
+ observation_space: structured JSON (FocusObservation)
57
+
58
+ tags:
59
+ - productivity
60
+ - student
61
+ - anti-distraction
62
+ - pomodoro
63
+ - llm-agent
64
+ - openenv
65
+ - meta-hackathon-2026
requirements.txt ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ fastapi==0.111.0
2
+ uvicorn[standard]==0.29.0
3
+ pydantic==2.7.1
4
+ httpx==0.27.0
5
+ python-dotenv==1.0.1
6
+ openai>=1.30.0