Spaces:
Sleeping
Sleeping
Create BLOG.MD
Browse files
BLOG.MD
ADDED
|
@@ -0,0 +1,44 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
FocusFlow: Training LLM Agents to Protect Student Attention Meta Γ Scaler OpenEnv Hackathon 2026 β Abdul Hannan
|
| 2 |
+
|
| 3 |
+
Why I Built This Hey everyone, I'm Abdul Hannan. While looking at other hackathon submissions, I noticed most people were building bots, simple games, or environments that don't connect to real problems. I wanted to build something that actually matters. One problem I've seen destroy people's potential β including my own β is distraction while studying. YouTube, BGMI, Instagram β these apps are scientifically engineered to steal your attention. Students lose hours every day and don't even realize it. So I built FocusFlow β an RL environment that trains an LLM agent to protect student attention.
|
| 4 |
+
|
| 5 |
+
The Core Idea FocusFlow is built on the Pomodoro technique β 25 minutes of deep focus, 5 minutes of break. But the agent doesn't just follow a timer. It has to:
|
| 6 |
+
|
| 7 |
+
Read natural language distraction events and judge urgency Manage cognitive load β the mental fatigue of studying Block the right apps at the right time Justify every single decision with graded reasoning
|
| 8 |
+
|
| 9 |
+
What Makes It LLM-Hard This is the part I'm most proud of. Most RL environments can be solved by a simple rule-based bot. FocusFlow cannot. Every action requires a mandatory reasoning field. The agent must explain:
|
| 10 |
+
|
| 11 |
+
What signal did it see? Why did it choose this action? How does this protect long-term focus?
|
| 12 |
+
|
| 13 |
+
Empty reasoning = -0.15 penalty. Good reasoning = +0.10 bonus. A bot that just types "focus" repeatedly scores zero. The agent has to genuinely think.
|
| 14 |
+
|
| 15 |
+
The 6 Actions
|
| 16 |
+
|
| 17 |
+
block_app β Block a distracting app proactively BGMI has temptation 0.95, YouTube 0.90. The higher the temptation, the bigger the reward for blocking early.
|
| 18 |
+
take_break β Rest when cognitive load > 0.80 If the agent keeps studying without breaks, cognitive load hits 1.0 and performance collapses. Smart agents rest before that happens.
|
| 19 |
+
defer_event β Reschedule low-urgency distractions Example: "Rohan texted β bhai BGMI chalate hain" β urgency 0.3 β defer it, stay focused.
|
| 20 |
+
respond_to_event β Handle urgent events immediately Example: "Professor posted β deadline moved to TODAY 11:59 PM" β urgency 0.95, can_defer=False β respond immediately or miss the deadline.
|
| 21 |
+
focus β Stay on task when no threats are active Earns small positive reward every step. The foundation of every good session.
|
| 22 |
+
adjust_energy β Restore energy when it drops below 0.35 Multi-day sessions drain energy. The agent must manage this across 3 days in Task 3.
|
| 23 |
+
Anti-Reward-Hacking Design In RL, agents are notorious for finding shortcuts β they maximize reward without actually solving the problem. I designed FocusFlow with 4 independent reward signals that are hard to game simultaneously: Reward = reasoning_quality Γ 0.20 + action_correctness + cognitive_load_dynamics
|
| 24 |
+
+ deadline_pressure_tracking Plus an anti-spam check β if the agent repeats the same words in reasoning, unique_ratio drops below 0.5 and score = 0. You can't just repeat "focus focus focus" and win.
|
| 25 |
+
|
| 26 |
+
Training Results I trained Qwen2.5-0.5B using GRPO via HuggingFace TRL. The agent went through 3 curriculum levels:
|
| 27 |
+
|
| 28 |
+
Level 1 (Beginner): Learn to block apps Level 2 (Intermediate): Handle events + manage breaks Level 3 (Advanced): Full strategy across multi-day sessions
|
| 29 |
+
|
| 30 |
+
Episode reward improved from -3.50 β +2.54 The agent learned to:
|
| 31 |
+
|
| 32 |
+
Block high-temptation apps before they cause distractions Defer low-urgency social messages Take recovery breaks exactly when cognitive load hits 0.75+
|
| 33 |
+
|
| 34 |
+
Real World Impact This isn't just a hackathon project. The reward logic in FocusFlow maps directly to proven productivity strategies used by top students and professionals worldwide. With some engineering, this agent's policy could be embedded into:
|
| 35 |
+
|
| 36 |
+
OS-level focus modes Study apps like Forest or Focusmate Student productivity tools
|
| 37 |
+
|
| 38 |
+
500 million students globally struggle with distraction. FocusFlow is a step toward an AI that fights for their attention instead of stealing it.
|
| 39 |
+
|
| 40 |
+
Links
|
| 41 |
+
|
| 42 |
+
Live Environment: https://hannan2859r-focusflow-env.hf.space GitHub: https://github.com/abdulhannan-18/Focus_Flow_env
|
| 43 |
+
|
| 44 |
+
Submitted by Abdul Hannan β Meta Γ Scaler OpenEnv Hackathon 2026
|