Spaces:

hannan2859r
/

focusflow_env

Sleeping

App Files Files Community

hannan2859r commited on about 1 month ago

Commit

fdd45f1

verified ·

1 Parent(s): d9deec1

Upload 8 files

Browse files

Files changed (8) hide show

Dockerfile +16 -0
README.md +151 -8
app.py +88 -0
environment.py +281 -0
inference.py +180 -0
models.py +74 -0
openenv.yaml +65 -0
requirements.txt +6 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,16 @@

+FROM python:3.11-slim
+WORKDIR /app
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    build-essential \
+    && rm -rf /var/lib/apt/lists/*
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+COPY . .
+EXPOSE 7860
+CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]

README.md CHANGED Viewed

@@ -1,11 +1,154 @@
 ---
-title: Focusflow Env
-emoji: 🚀
-colorFrom: purple
-colorTo: purple
-sdk: docker
-pinned: false
-license: mit
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# FocusFlow RL Environment
+### Meta x Scaler OpenEnv Hackathon 2026
+> An RL environment where an AI agent learns to manage a student's focus session —
+> blocking distracting apps, timing breaks, and maximising deep-focus time.
+---
+## What It Is
+FocusFlow is an **OpenEnv-compatible reinforcement learning environment** built on top of
+Meta's OpenEnv framework. An LLM agent is placed in a student's digital world and must:
+- **Block** distracting apps (Instagram, YouTube, BGMI, etc.) before they steal focus
+- **Time breaks** correctly using the Pomodoro technique (25 min focus / 5 min break)
+- **Resist** distraction events that spawn randomly during the session
+- **Maximise** the focus score across multiple study sessions
+The environment simulates a realistic student productivity scenario — making it a strong
+candidate for training agents that improve human focus and wellbeing.
+---
+## Environment Design
+### Action Space (5 discrete actions)
+| Action | Description | Reward |
+|---|---|---|
+| `focus` | Stay focused, do nothing | +0.05 per step |
+| `block_app` | Block a distracting app | +0.20 × temptation_level |
+| `take_break` | Take a voluntary break | +0.30 if timed correctly |
+| `adjust_timer` | Change pomodoro duration | +0.01 |
+| `check_app` | Give in to distraction | **-0.50** |
+### Observation Space
+```json
+{
+  "time_remaining_seconds": 1200,
+  "current_phase": "focus",
+  "active_distractions": ["Instagram", "YouTube"],
+  "blocked_apps": ["BGMI"],
+  "sessions_completed": 0,
+  "focus_score": 0.85,
+  "last_action_feedback": "Blocked BGMI. Reward scaled by temptation level (0.95).",
+  "distraction_event": "Reddit"
+}
+```
+### Reward Function
+Simple, clean rewards for stable RL training (binary/shaped hybrid):
+```
++ 0.05  per step in pure focus mode
++ 0.20 × temptation  for blocking an app proactively
++ 0.30  for a well-timed break (at session boundary)
+- 0.50  for checking a distracting app (hard penalty)
+- 0.10  for taking a break mid-session
+```
+### Tasks
+Three tasks of increasing difficulty:
+| Task | Goal | Max Steps |
+|---|---|---|
+| `task_1` | Complete 1 session with zero distractions | 60 |
+| `task_2` | Complete 2 sessions with correct break timing | 120 |
+| `task_3` | Block all 5 apps within 10 steps, then complete a session | 80 |
 ---
+## OpenEnv API
+The server exposes the standard OpenEnv HTTP API:
+```
+POST /reset?task_id=task_1    → FocusObservation
+POST /step  (body: FocusAction) → FocusObservation + reward + done
+GET  /state                   → FocusState (full internal state)
+GET  /health                  → {"status": "ok"}
+GET  /tasks                   → list of all tasks
+```
+### Quick Start (local)
+```bash
+# Install
+pip install -r requirements.txt
+# Run server
+uvicorn app:app --host 0.0.0.0 --port 7860 --reload
+# In another terminal: reset and take a step
+curl -X POST http://localhost:7860/reset?task_id=task_1
+curl -X POST http://localhost:7860/step \
+     -H "Content-Type: application/json" \
+     -d '{"action_type": "block_app", "app_name": "Instagram", "reasoning": "Block high temptation early"}'
+```
+### Run the LLM Agent
+```bash
+export API_BASE_URL=https://api.groq.com/openai/v1
+export MODEL_NAME=llama-3.1-8b-instant
+export HF_TOKEN=your_token_here
+export ENV_BASE_URL=http://localhost:7860
+export TASK_ID=task_1
+python inference.py
+```
+### Deploy to HF Spaces
+```bash
+# Install OpenEnv CLI
+pip install openenv
+# Push to Hugging Face Spaces
+openenv deploy --space YOUR_HF_USERNAME/focusflow-env
+```
+---
+## Project Structure
+```
+focusflow_rl_env/
+├── models.py        # Pydantic: FocusAction, FocusObservation, FocusState
+├── environment.py   # Core RL logic: step(), reset(), state(), reward
+├── app.py           # FastAPI server exposing OpenEnv HTTP API
+├── inference.py     # LLM baseline agent (Groq/OpenAI compatible)
+├── Dockerfile       # Container for HF Spaces deployment
+├── requirements.txt
+├── openenv.yaml     # OpenEnv metadata
+└── README.md
+```
+---
+## Why This Problem?
+Student distraction is one of the most real, measurable problems in the world.
+Phones, social media and short-form video are scientifically proven to reduce
+deep work capacity. An RL agent that learns optimal focus management strategies
+could be embedded in productivity apps, study tools, or OS-level focus modes —
+making it immediately useful beyond the hackathon.
 ---
+## Submitted by
+Abdul Hannan — Meta x Scaler OpenEnv Hackathon 2026

app.py ADDED Viewed

	@@ -0,0 +1,88 @@

+"""
+FocusFlow RL Environment — app.py
+FastAPI server exposing the OpenEnv HTTP API:
+  POST /reset
+  POST /step
+  GET  /state
+  GET  /health
+  GET  /tasks
+"""
+from fastapi import FastAPI, HTTPException
+from fastapi.middleware.cors import CORSMiddleware
+from models import FocusAction, FocusObservation, FocusState
+from environment import FocusFlowEnvironment, TASKS
+from typing import Optional
+import uvicorn
+app = FastAPI(
+    title="FocusFlow RL Environment",
+    description="OpenEnv-compatible RL environment for student focus & anti-distraction agent training.",
+    version="1.0.0",
+)
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+# One environment per server instance (stateful server pattern as per OpenEnv)
+env: Optional[FocusFlowEnvironment] = None
+# ─── Endpoints ────────────────────────────────────────────────────────────────
+@app.get("/health")
+def health():
+    return {"status": "ok", "environment": "FocusFlow", "version": "1.0.0"}
+@app.get("/tasks")
+def list_tasks():
+    """List all available tasks."""
+    return {"tasks": TASKS}
+@app.post("/reset", response_model=FocusObservation)
+def reset(task_id: str = "task_1", seed: int = 42):
+    """
+    Reset the environment and return initial observation.
+    Optionally specify which task to load.
+    """
+    global env
+    if task_id not in [t["id"] for t in TASKS]:
+        raise HTTPException(status_code=400, detail=f"Unknown task_id: {task_id}. Available: {[t['id'] for t in TASKS]}")
+    env = FocusFlowEnvironment(task_id=task_id, seed=seed)
+    obs = env.reset()
+    return obs
+class StepResponse(FocusObservation):
+    reward: float
+    done: bool
+    info: dict
+@app.post("/step", response_model=StepResponse)
+def step(action: FocusAction):
+    """
+    Submit one action and receive the next observation + reward.
+    """
+    if env is None:
+        raise HTTPException(status_code=400, detail="Environment not initialised. Call /reset first.")
+    obs, reward, done, info = env.step(action)
+    return StepResponse(**obs.model_dump(), reward=reward, done=done, info=info)
+@app.get("/state", response_model=FocusState)
+def state():
+    """Return the full internal environment state."""
+    if env is None:
+        raise HTTPException(status_code=400, detail="Environment not initialised. Call /reset first.")
+    return env.state()
+if __name__ == "__main__":
+    uvicorn.run("app:app", host="0.0.0.0", port=7860, reload=True)

environment.py ADDED Viewed

	@@ -0,0 +1,281 @@

+"""
+FocusFlow RL Environment — environment.py
+Core logic: tasks, reward shaping, grader, episode management
+"""
+import random
+from typing import Tuple, List, Optional
+from models import (
+    FocusAction, FocusObservation, FocusState,
+    DistractingApp, AppCategory
+)
+# ─── Configurable tasks ───────────────────────────────────────────────────────
+TASKS = [
+    {
+        "id": "task_1",
+        "description": "Complete one 25-minute focus session without checking any distracting app.",
+        "success_condition": "sessions_completed >= 1 and len(apps_checked) == 0",
+        "max_steps": 60,
+        "bonus": "Block at least 3 apps before the session ends for a 0.2 bonus.",
+    },
+    {
+        "id": "task_2",
+        "description": "Complete two focus sessions with strategically timed breaks (take_break at the right time).",
+        "success_condition": "sessions_completed >= 2 and breaks_taken >= 2",
+        "max_steps": 120,
+        "bonus": "Never check a distracting app for a full 0.15 bonus.",
+    },
+    {
+        "id": "task_3",
+        "description": "Manage a high-distraction environment: block all 5 apps within 10 steps and maintain focus.",
+        "success_condition": "len(apps_blocked) >= 5 and sessions_completed >= 1",
+        "max_steps": 80,
+        "bonus": "Block all apps within first 8 steps for 0.25 bonus.",
+    },
+]
+# ─── Distraction pool ─────────────────────────────────────────────────────────
+DISTRACTION_POOL: List[DistractingApp] = [
+    DistractingApp(name="Instagram",  category=AppCategory.social_media, temptation_level=0.85),
+    DistractingApp(name="YouTube",    category=AppCategory.video,        temptation_level=0.90),
+    DistractingApp(name="WhatsApp",   category=AppCategory.messaging,    temptation_level=0.70),
+    DistractingApp(name="Twitter",    category=AppCategory.social_media, temptation_level=0.75),
+    DistractingApp(name="BGMI",       category=AppCategory.gaming,       temptation_level=0.95),
+    DistractingApp(name="Reddit",     category=AppCategory.news,         temptation_level=0.80),
+    DistractingApp(name="Netflix",    category=AppCategory.video,        temptation_level=0.88),
+    DistractingApp(name="Snapchat",   category=AppCategory.social_media, temptation_level=0.72),
+]
+FOCUS_DURATION_SECONDS  = 25 * 60   # 25 minutes
+SHORT_BREAK_SECONDS     = 5  * 60   # 5 minutes
+LONG_BREAK_SECONDS      = 15 * 60   # 15 minutes (every 4 sessions)
+class FocusFlowEnvironment:
+    """
+    OpenEnv-compatible RL environment for the FocusFlow anti-distraction agent.
+    Implements step() / reset() / state() as per OpenEnv spec.
+    """
+    def __init__(self, task_id: str = "task_1", seed: int = 42):
+        random.seed(seed)
+        self.task = next(t for t in TASKS if t["id"] == task_id)
+        self._reset_internal()
+    # ── Internal helpers ──────────────────────────────────────────────────────
+    def _reset_internal(self):
+        self.step_count           = 0
+        self.max_steps            = self.task["max_steps"]
+        self.total_focus_secs     = 0
+        self.total_distraction_s  = 0
+        self.sessions_completed   = 0
+        self.breaks_taken         = 0
+        self.apps_blocked: List[str] = []
+        self.apps_checked: List[str] = []
+        self.current_phase        = "focus"
+        self.time_remaining       = FOCUS_DURATION_SECONDS
+        self.cumulative_reward    = 0.0
+        self.done                 = False
+        self.active_distractions  = self._sample_distractions(3)
+    def _sample_distractions(self, n: int) -> List[str]:
+        """Pick n random distracting apps not already blocked."""
+        available = [d.name for d in DISTRACTION_POOL if d.name not in self.apps_blocked]
+        return random.sample(available, min(n, len(available)))
+    def _maybe_spawn_distraction(self) -> Optional[str]:
+        """30% chance each step to surface a new distraction."""
+        if random.random() < 0.30:
+            available = [
+                d.name for d in DISTRACTION_POOL
+                if d.name not in self.apps_blocked
+                and d.name not in self.active_distractions
+            ]
+            if available:
+                new_app = random.choice(available)
+                self.active_distractions.append(new_app)
+                return new_app
+        return None
+    def _compute_reward(self, action: FocusAction) -> Tuple[float, str]:
+        """
+        Reward function — clean and interpretable for RL training.
+        Positive rewards:
+          +0.5   per completed focus session (no distractions)
+          +0.3   for a well-timed voluntary break
+          +0.2   for blocking a high-temptation app before being distracted
+          +0.05  per step spent in pure focus mode
+        Negative rewards:
+          -0.5   for checking a distracting app
+          -0.1   for taking a break at the wrong time (mid-session, not at boundary)
+          -0.05  per step in focus mode with unblocked high-temptation app active
+        """
+        reward = 0.0
+        feedback = ""
+        if action.action_type == "focus":
+            reward   += 0.05
+            feedback  = "Good. Staying focused adds a small step reward."
+        elif action.action_type == "block_app":
+            if action.app_name and action.app_name not in self.apps_blocked:
+                app_obj = next((d for d in DISTRACTION_POOL if d.name == action.app_name), None)
+                if app_obj:
+                    self.apps_blocked.append(action.app_name)
+                    if action.app_name in self.active_distractions:
+                        self.active_distractions.remove(action.app_name)
+                    reward   += 0.20 * app_obj.temptation_level  # scale by how tempting it was
+                    feedback  = f"Blocked {action.app_name}. Reward scaled by temptation level ({app_obj.temptation_level:.2f})."
+                else:
+                    feedback = "App not found in distraction pool — no reward."
+            else:
+                feedback = "App already blocked or not specified."
+        elif action.action_type == "take_break":
+            if self.current_phase == "focus" and self.time_remaining <= 30:
+                # Strategic: break at session boundary
+                reward   += 0.30
+                feedback  = "Well-timed break at session boundary! +0.30 reward."
+                self.current_phase  = "break"
+                self.time_remaining = SHORT_BREAK_SECONDS if (self.sessions_completed + 1) % 4 != 0 else LONG_BREAK_SECONDS
+                self.breaks_taken  += 1
+            elif self.current_phase == "break":
+                feedback = "Already on a break. No reward."
+            else:
+                reward   -= 0.10
+                feedback  = "Break taken mid-session. -0.10 penalty."
+                self.breaks_taken += 1
+        elif action.action_type == "check_app":
+            app = action.app_name or (self.active_distractions[0] if self.active_distractions else None)
+            if app:
+                reward   -= 0.50
+                feedback  = f"Gave in to {app}! Hard penalty: -0.50."
+                self.apps_checked.append(app)
+                self.total_distraction_s += 60  # assume 1 min lost per check
+            else:
+                feedback = "No active distraction to check."
+        elif action.action_type == "adjust_timer":
+            # Neutral but allows personalisation
+            reward   += 0.01
+            feedback  = f"Timer adjusted to {action.timer_minutes} min. Minimal reward."
+        return reward, feedback
+    def _advance_time(self, seconds: int = 60):
+        """Advance simulation by `seconds`. Transitions phase when timer hits 0."""
+        self.time_remaining -= seconds
+        if self.time_remaining <= 0:
+            if self.current_phase == "focus":
+                self.sessions_completed += 1
+                self.total_focus_secs   += FOCUS_DURATION_SECONDS
+                # start break
+                self.current_phase  = "break"
+                self.time_remaining = SHORT_BREAK_SECONDS if self.sessions_completed % 4 != 0 else LONG_BREAK_SECONDS
+            else:
+                # break ended, start new focus session
+                self.current_phase  = "focus"
+                self.time_remaining = FOCUS_DURATION_SECONDS
+                self.active_distractions = self._sample_distractions(2)
+    def _check_success(self) -> bool:
+        """Evaluate the task success condition."""
+        sessions_completed = self.sessions_completed
+        apps_blocked       = self.apps_blocked
+        apps_checked       = self.apps_checked
+        breaks_taken       = self.breaks_taken
+        try:
+            return eval(self.task["success_condition"])  # noqa: S307
+        except Exception:
+            return False
+    # ── Public OpenEnv API ────────────────────────────────────────────────────
+    def reset(self) -> FocusObservation:
+        """Reset the environment and return the initial observation."""
+        self._reset_internal()
+        return FocusObservation(
+            time_remaining_seconds = self.time_remaining,
+            current_phase          = self.current_phase,
+            active_distractions    = list(self.active_distractions),
+            blocked_apps           = list(self.apps_blocked),
+            sessions_completed     = self.sessions_completed,
+            focus_score            = 0.0,
+            last_action_feedback   = f"Environment reset. Task: {self.task['description']}",
+            distraction_event      = None,
+        )
+    def step(self, action: FocusAction) -> Tuple[FocusObservation, float, bool, dict]:
+        """
+        Process one agent action.
+        Returns: (observation, reward, done, info)
+        """
+        if self.done:
+            raise RuntimeError("Episode is done. Call reset() to start a new episode.")
+        self.step_count += 1
+        # Advance simulated time (each step = 1 minute in the student's world)
+        self._advance_time(seconds=60)
+        # Compute reward and get feedback
+        reward, feedback = self._compute_reward(action)
+        # Maybe spawn a new distraction
+        new_distraction = self._maybe_spawn_distraction()
+        # Compute running focus score
+        focus_ratio = (
+            self.total_focus_secs /
+            max(1, self.total_focus_secs + self.total_distraction_s)
+        )
+        # Check episode termination
+        success = self._check_success()
+        self.done = self.step_count >= self.max_steps or success
+        self.cumulative_reward += reward
+        obs = FocusObservation(
+            time_remaining_seconds = self.time_remaining,
+            current_phase          = self.current_phase,
+            active_distractions    = list(self.active_distractions),
+            blocked_apps           = list(self.apps_blocked),
+            sessions_completed     = self.sessions_completed,
+            focus_score            = round(focus_ratio, 3),
+            last_action_feedback   = feedback,
+            distraction_event      = new_distraction,
+        )
+        info = {
+            "step":       self.step_count,
+            "success":    success,
+            "cumulative": round(self.cumulative_reward, 4),
+        }
+        return obs, round(reward, 4), self.done, info
+    def state(self) -> FocusState:
+        """Return the full internal state (for debugging / logging)."""
+        return FocusState(
+            episode_step              = self.step_count,
+            max_steps                 = self.max_steps,
+            total_focus_seconds       = self.total_focus_secs,
+            total_distraction_seconds = self.total_distraction_s,
+            sessions_completed        = self.sessions_completed,
+            breaks_taken              = self.breaks_taken,
+            apps_blocked              = list(self.apps_blocked),
+            apps_checked              = list(self.apps_checked),
+            current_phase             = self.current_phase,
+            time_remaining_seconds    = self.time_remaining,
+            cumulative_reward         = round(self.cumulative_reward, 4),
+            done                      = self.done,
+        )

inference.py ADDED Viewed

	@@ -0,0 +1,180 @@

+"""
+FocusFlow RL Environment — inference.py
+HACKATHON SUBMISSION — Meta x Scaler OpenEnv 2026
+CRITICAL: Logs MUST follow [START] / [STEP] / [END] format exactly.
+          Uses OpenAI client as required by the hackathon spec.
+          Runtime < 20 min | Runs on vcpu=2, memory=8gb
+"""
+import os
+import json
+import httpx
+from openai import OpenAI
+# ── Env vars (required by hackathon spec) ────────────────────────────────────
+API_BASE_URL = os.environ.get("API_BASE_URL",  "https://api.groq.com/openai/v1")
+MODEL_NAME   = os.environ.get("MODEL_NAME",    "llama-3.1-8b-instant")
+HF_TOKEN     = os.environ.get("HF_TOKEN",      "")
+ENV_BASE_URL = os.environ.get("ENV_BASE_URL",  "http://localhost:7860")
+MAX_STEPS    = int(os.environ.get("MAX_STEPS", "30"))
+# ── OpenAI client (REQUIRED by hackathon — do not use httpx for LLM calls) ──
+llm_client = OpenAI(base_url=API_BASE_URL, api_key=HF_TOKEN)
+SYSTEM_PROMPT = """You are an AI agent managing a student's focus session.
+Goal: maximise focus, minimise distractions across the episode.
+Actions you can take — respond ONLY with valid JSON:
+  focus        -> stay focused (small step reward)
+  block_app    -> block a distracting app (include "app_name")
+  take_break   -> take a voluntary break (reward if timed at session boundary)
+  check_app    -> give in to distraction (HEAVY -0.50 PENALTY, never do this)
+  adjust_timer -> change pomodoro length (include "timer_minutes": int)
+Response format (JSON only, no markdown fences):
+{
+  "action_type": "block_app",
+  "app_name": "Instagram",
+  "reasoning": "Block high-temptation app early."
+}
+Strategy:
+1. Block high-temptation apps in the first few steps.
+2. Stay in focus mode to accumulate +0.05 per step.
+3. Take a break only when time_remaining < 60 seconds (session boundary).
+4. NEVER use check_app.
+"""
+def call_llm(messages: list) -> dict:
+    """Call LLM via OpenAI client and parse JSON action."""
+    response = llm_client.chat.completions.create(
+        model=MODEL_NAME,
+        messages=messages,
+        temperature=0.2,
+        max_tokens=200,
+    )
+    text = response.choices[0].message.content.strip()
+    text = text.replace("```json", "").replace("```", "").strip()
+    return json.loads(text)
+def run_episode(task_id: str, episode_num: int) -> dict:
+    """Run one full episode. Returns episode summary dict."""
+    base = ENV_BASE_URL.rstrip("/")
+    # Reset environment
+    reset_resp = httpx.post(f"{base}/reset", params={"task_id": task_id}, timeout=30)
+    reset_resp.raise_for_status()
+    obs = reset_resp.json()
+    # [START] log — REQUIRED format, judges parse this
+    print(json.dumps({
+        "type":        "[START]",
+        "episode":     episode_num,
+        "task_id":     task_id,
+        "initial_obs": obs,
+    }))
+    messages     = [{"role": "system", "content": SYSTEM_PROMPT}]
+    total_reward = 0.0
+    step         = 0
+    done         = False
+    last_info    = {}
+    while not done and step < MAX_STEPS:
+        step += 1
+        user_content = (
+            f"Step {step}.\n"
+            f"phase={obs['current_phase']} | "
+            f"time_remaining={obs['time_remaining_seconds']}s | "
+            f"sessions_done={obs['sessions_completed']} | "
+            f"focus_score={obs['focus_score']}\n"
+            f"active_distractions={obs['active_distractions']}\n"
+            f"blocked_apps={obs['blocked_apps']}\n"
+            f"last_feedback={obs['last_action_feedback']}\n"
+            f"new_distraction={obs.get('distraction_event')}\n"
+            "Choose action (JSON only):"
+        )
+        messages.append({"role": "user", "content": user_content})
+        try:
+            action = call_llm(messages)
+        except Exception as e:
+            action = {"action_type": "focus", "reasoning": f"LLM error: {e}"}
+        messages.append({"role": "assistant", "content": json.dumps(action)})
+        step_resp = httpx.post(f"{base}/step", json=action, timeout=30)
+        step_resp.raise_for_status()
+        result = step_resp.json()
+        reward       = result["reward"]
+        done         = result["done"]
+        last_info    = result.get("info", {})
+        obs          = result
+        total_reward += reward
+        # [STEP] log — REQUIRED format, judges parse this
+        print(json.dumps({
+            "type":    "[STEP]",
+            "episode": episode_num,
+            "step":    step,
+            "action":  action,
+            "reward":  round(reward, 4),
+            "done":    done,
+            "obs": {
+                "phase":          obs["current_phase"],
+                "time_remaining": obs["time_remaining_seconds"],
+                "focus_score":    obs["focus_score"],
+                "sessions":       obs["sessions_completed"],
+                "blocked":        obs["blocked_apps"],
+                "distractions":   obs["active_distractions"],
+            },
+        }))
+    # [END] log — REQUIRED format, judges parse this
+    print(json.dumps({
+        "type":         "[END]",
+        "episode":      episode_num,
+        "task_id":      task_id,
+        "total_reward": round(total_reward, 4),
+        "steps":        step,
+        "success":      last_info.get("success", False),
+    }))
+    return {
+        "episode":      episode_num,
+        "task_id":      task_id,
+        "total_reward": round(total_reward, 4),
+        "steps":        step,
+        "success":      last_info.get("success", False),
+    }
+def main():
+    tasks   = ["task_1", "task_2", "task_3"]
+    results = []
+    for i, task_id in enumerate(tasks, start=1):
+        try:
+            result = run_episode(task_id=task_id, episode_num=i)
+            results.append(result)
+        except Exception as e:
+            print(json.dumps({"type": "[ERROR]", "episode": i, "error": str(e)}))
+    avg_reward   = sum(r["total_reward"] for r in results) / max(len(results), 1)
+    success_rate = sum(1 for r in results if r["success"]) / max(len(results), 1)
+    print(json.dumps({
+        "type":         "SUMMARY",
+        "avg_reward":   round(avg_reward, 4),
+        "success_rate": round(success_rate, 4),
+        "episodes":     results,
+    }))
+if __name__ == "__main__":
+    main()

models.py ADDED Viewed

	@@ -0,0 +1,74 @@

+"""
+FocusFlow RL Environment — models.py
+OpenEnv hackathon submission: Meta x Scaler 2026
+Pydantic models for Action, Observation, State
+"""
+from pydantic import BaseModel, Field
+from typing import Literal, List, Optional
+from enum import Enum
+class AppCategory(str, Enum):
+    social_media = "social_media"
+    video        = "video"
+    messaging    = "messaging"
+    gaming       = "gaming"
+    news         = "news"
+class DistractingApp(BaseModel):
+    name: str
+    category: AppCategory
+    temptation_level: float = Field(..., ge=0.0, le=1.0, description="How tempting (0=low, 1=high)")
+# ─── Action ───────────────────────────────────────────────────────────────────
+class FocusAction(BaseModel):
+    """
+    The agent submits one of these actions each step.
+    action_type options:
+      - focus        : continue working, no distractions
+      - block_app    : block a specific distracting app
+      - take_break   : voluntarily take a break (strategic)
+      - check_app    : give in to a distraction (penalised)
+      - adjust_timer : change the current pomodoro duration
+    """
+    action_type: Literal["focus", "block_app", "take_break", "check_app", "adjust_timer"]
+    app_name: Optional[str]       = Field(None, description="App to block or check (if applicable)")
+    timer_minutes: Optional[int]  = Field(None, ge=5, le=60, description="New timer duration (adjust_timer only)")
+    reasoning: Optional[str]      = Field(None, description="Agent's reasoning for this action (used by LLM grader)")
+# ─── Observation ──────────────────────────────────────────────────────────────
+class FocusObservation(BaseModel):
+    """What the agent sees after each step."""
+    time_remaining_seconds: int              = Field(..., description="Seconds left in current session")
+    current_phase: Literal["focus", "break"] = Field(..., description="Whether we are in a focus or break phase")
+    active_distractions: List[str]           = Field(..., description="Apps currently tempting the agent")
+    blocked_apps: List[str]                  = Field(..., description="Apps the agent has blocked so far")
+    sessions_completed: int                  = Field(..., description="Number of completed pomodoro sessions")
+    focus_score: float                       = Field(..., ge=0.0, le=1.0, description="Running focus quality score")
+    last_action_feedback: str                = Field(..., description="Human-readable feedback on last action")
+    distraction_event: Optional[str]         = Field(None, description="A new temptation that just appeared, if any")
+# ─── State ────────────────────────────────────────────────────────────────────
+class FocusState(BaseModel):
+    """Full internal environment state (returned by state() API call)."""
+    episode_step: int
+    max_steps: int
+    total_focus_seconds: int
+    total_distraction_seconds: int
+    sessions_completed: int
+    breaks_taken: int
+    apps_blocked: List[str]
+    apps_checked: List[str]         = Field(default_factory=list, description="Distractions the agent gave in to")
+    current_phase: Literal["focus", "break"]
+    time_remaining_seconds: int
+    cumulative_reward: float
+    done: bool

openenv.yaml ADDED Viewed

	@@ -0,0 +1,65 @@

+name: focusflow-env
+description: >
+  An RL environment where an AI agent learns to manage a student's focus session.
+  The agent blocks distracting apps, times breaks correctly, and maximises
+  deep-focus time using a Pomodoro-style framework.
+  Built on Meta's OpenEnv framework for the Meta x Scaler Hackathon 2026.
+version: "1.0.0"
+author: Abdul Hannan
+license: MIT
+environment:
+  base_url: https://YOUR-HF-SPACE-NAME.hf.space
+  framework: openenv
+  language: python
+  python_version: "3.11"
+api:
+  reset:
+    method: POST
+    path: /reset
+    params:
+      - name: task_id
+        type: string
+        default: task_1
+        description: Which task to load (task_1, task_2, task_3)
+      - name: seed
+        type: integer
+        default: 42
+  step:
+    method: POST
+    path: /step
+    body: FocusAction
+  state:
+    method: GET
+    path: /state
+tasks:
+  - id: task_1
+    description: Complete one 25-min focus session without checking any distracting app.
+    max_steps: 60
+    success_reward: 1.0
+  - id: task_2
+    description: Complete two sessions with strategically timed breaks.
+    max_steps: 120
+    success_reward: 1.0
+  - id: task_3
+    description: Block all 5 distracting apps within 10 steps then complete a session.
+    max_steps: 80
+    success_reward: 1.0
+reward_range: [-0.5, 0.5]
+action_space: discrete (5 action types)
+observation_space: structured JSON (FocusObservation)
+tags:
+  - productivity
+  - student
+  - anti-distraction
+  - pomodoro
+  - llm-agent
+  - openenv
+  - meta-hackathon-2026

requirements.txt ADDED Viewed

	@@ -0,0 +1,6 @@

+fastapi==0.111.0
+uvicorn[standard]==0.29.0
+pydantic==2.7.1
+httpx==0.27.0
+python-dotenv==1.0.1
+openai>=1.30.0