Spaces:

hannan2859r
/

focusflow_env

Sleeping

App Files Files Community

focusflow_env / README.md

hannan2859r

Update README.md

68aeab7 verified 11 days ago

preview code

raw

history blame contribute delete

6.17 kB

metadata

title: FocusFlow RL Environment
emoji: 🎯
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: true
short_description: LLM-hard OpenEnv RL env for student focus management
thumbnail: >-
  https://cdn-uploads.huggingface.co/production/uploads/68f093da561f15826cc8ad59/y40SmMZCx-xgI4v4wH3pS.png

🧠 FocusFlow: LLM-Hard RL Environment for Cognitive Management

Meta × Scaler OpenEnv Hackathon 2026 — Grand Finale Submission

Links: Google Colab:https://colab.research.google.com/drive/16wJ4mw6sdcTuOYABpdoV2AuO6_KYnc4Q?usp=sharing

Github:https://github.com/abdulhannan-18/Focus_Flow_env

Executive Summary: FocusFlow is an OpenEnv-compliant reinforcement learning environment that simulates the cognitive friction of modern digital life. It abandons traditional spatial tasks (like moving a robot arm) in favor of LLM-hard cognitive tasks: managing mental energy, tracking shifting deadlines, and utilizing natural language comprehension to filter informal social distractions from urgent professional tasks.

🎯 Hackathon Theme Alignment

Core Themes Addressed: Long-Horizon Planning & Instruction Following | World Modeling across Professional/Personal Tasks

The Problem Statement: Modern digital workspaces cause catastrophic context-switching. Traditional RL bots fail here because evaluating a distraction requires contextual language understanding. The problem is designing an environment that forces an AI agent to manage time, mental energy, and dynamic deadlines while processing rich natural-language interruptions.
The Environment: A fully Dockerized, RESTful API environment. The world state dynamically models time progression, cognitive load (rising with work, decaying with breaks), and an event engine that injects multi-tiered distractions.
Agent Capabilities Required: Agents must possess reading comprehension (urgency evaluation), multi-day memory (tracking deferred events before they expire), and Chain-of-Thought (CoT) reasoning to justify scheduling decisions.

🏗️ System Architecture & Observation Space

The environment operates via a FastAPI backend, serving strictly typed JSON payloads. The observation space is designed to be highly complex, forcing the LLM to synthesize multiple data streams.

Example Observation Payload

{
  "time_remaining_seconds": 1140,
  "current_phase": "focus",
  "sessions_completed": 1,
  "focus_score": 0.923,
  "cognitive_load": 0.62,
  "deadline_pressure": 0.45,
  "active_distractions": ["Instagram", "BGMI"],
  "blocked_apps": ["YouTube"],
  "pending_event": {
    "type": "social_message",
    "description": "Rahul texted: 'bhai BGMI chalate hain, sirf 1 ghanta, kal exam nahi hai'",
    "urgency": 0.30,
    "can_defer": true,
    "deadline_steps": 8,
    "correct_action": "defer_event"
  },
  "day_context": {
    "day_number": 1,
    "energy_level": 0.84,
    "pending_deadlines": [
      {"task": "Math Assignment", "due_step": 45, "completed": false}
    ]
  },
  "last_action_feedback": "Well-timed break: +0.30 | Good reasoning (0.82): +0.10",
  "reasoning_quality_score": 0.82
}

⚖️ Dual-Layer Reward Model & Evaluation Logic

FocusFlow implements a hybrid objective/subjective reward function.

1. Objective Mechanical Rewards

Action	Environmental Trigger	Reward / Penalty
`focus`	Executed during work phase	`+0.05 × (1 − cognitive_load)`
`block_app`	Targets an active high-temptation app	`+0.20 × temptation_level`
`take_break`	Executed when `cognitive_load > 0.75`	`+0.20` to `+0.30`
`defer_event`	Postpones a low-urgency social text	`+0.15` (Correct) / `-0.05` (Wrong)
`respond_to_event`	Handles urgent/hard deadlines	`+0.20` (Correct) / `-0.10` (Wrong)
`plan_day`	Sets schedule aligning with deadlines	`+0.00` to `+0.30` (Quality scaled)
`check_app`	(BAD) Agent gives in to temptation	`-0.50` Hard Penalty

2. Subjective Reasoning Grader

To prevent random action-spamming, the grade_reasoning() heuristic parses the agent's mandatory reasoning field.

It applies a ±0.10 multiplier based on the use of causal language, task-awareness, and logical alignment with the current pending_event.
Empty or repetitive reasoning results in immediate reward degradation.

📋 Task Progressions

Task ID	Challenge Pillar	Success Criteria	Horizon
`task_1`	Execution	Complete a 25-min session with 0 app checks. Handle basic distractions logically.	60 Steps
`task_2`	Load Management	Complete a multi-session day. Keep `cognitive_load < 0.85` via strategic breaks.	120 Steps
`task_3`	Long-Horizon	Execute a 3-day plan, manage energy decay, and maintain a perfect focus streak.	240 Steps

🚀 Post-Training & Self-Improvement Strategy (GRPO)

A baseline LLM will struggle with FocusFlow's delayed rewards (e.g., deferring an event now to save energy for a deadline 50 steps later).

To achieve an optimal policy, the project includes a Group Relative Policy Optimization (GRPO) pipeline:

Framework: Uses TRL (Transformer Reinforcement Learning) and Unsloth for efficient 4-bit quantization on consumer hardware (T4 GPUs).
Data Generation: The baseline agent explores the live FastAPI environment, collecting trajectories of observations, actions, and rewards.
Optimization: GRPO updates the LLM weights directly based on the environment's trajectory rewards, teaching the model that maintaining cognitive load and providing high-quality reasoning yields the highest cumulative return.

💻 Technical Setup & Quick Start

Local Installation

# Clone the repository