# AGENTS.md — Context & Lessons for Future Sessions > This file exists because sandboxes reset and I (the agent) lose all memory. > **READ THIS FIRST** before doing anything on this project. --- ## What This Project Is - **Challenge**: TIL-26-AE (The Intelligent League — Automated Exploration) - **Game**: Multi-agent Bomberman on a 16×16 grid - **My Role**: Train `agent_0` via RL to compete autonomously - **Main Repo**: `E-Rong/til-26-ae-agent` (models, checkpoints, scripts, docs) - **Space**: `e-rong/til-26-ae` (evaluation server with `ae/src/ae_manager.py`) - **TIL Source**: Private Space `e-rong/til-26-ae` — contains `til_environment/` module --- ## CRITICAL: What Killed Training & Cost Money ### ❌ NEVER USE SANDBOXES FOR TRAINING > 30 MINUTES Sandboxes are **interactive dev environments**. They: - Recycle after inactivity / timeout - Kill processes silently - **Keep billing you while empty** after the process dies **Damage done**: ~$4.87 wasted across 4 sandbox sessions where training died but billing continued. ### ✅ ALWAYS USE HF JOBS FOR BATCH TRAINING - Persistent GPU allocation - Runs until completion (or your timeout) - Fails visibly if something breaks (no silent empty billing) - Must set `namespace="E-Rong"` to bill the org, not the user ### ❌ NEVER `git clone` A PRIVATE REPO IN AN HF JOB `git clone https://huggingface.co/spaces/...` fails because git does not read `HF_TOKEN`. **Use instead**: ```python from huggingface_hub import snapshot_download snapshot_download( repo_id='e-rong/til-26-ae', repo_type='space', local_dir='/app/til-26-ae-repo' ) ``` `snapshot_download` auto-uses the `HF_TOKEN` env var. ### ✅ ALWAYS SMOKE-TEST A JOB BEFORE THE FULL RUN Submit a 5-minute job that: 1. Downloads the TIL repo 2. Installs deps 3. Runs 100 training steps 4. Saves a dummy checkpoint to the Hub Only after this succeeds, submit the multi-hour job. --- ## Session Startup Checklist Before doing **anything** on this project: 1. [ ] Read `session_state.json` from `E-Rong/til-26-ae-agent` 2. [ ] Read this file (`AGENTS.md`) 3. [ ] Check latest checkpoint on Hub (sort `phase*_ckpt_*.zip` files) 4. [ ] Determine current phase and remaining steps 5. [ ] If training needed: write script to sandbox, **smoke-test in HF Job first** --- ## Technical Decisions That Work ### MaskablePPO + Action Masking - `sb3_contrib.MaskablePPO` with `ActionMasker` - Bomberman has `action_mask: uint8[6]` — walls/edges make moves illegal - Standard PPO wastes ~30-40% samples on illegal actions early on - **Papers**: Huang et al. "Superstition, Imagination, and the Invalid Action Problem" (arxiv:2006.14171) ### Observation Flattening 1511-dim vector from dict observation: ``` agent_viewcone: 7×5×25 = 875 base_viewcone: 5×5×25 = 625 direction, location[2], base_location[2], health, frozen_ticks, base_health, team_resources, team_bombs, step = 11 scalars Total: 1511 ``` ### Wrapper Order (CRITICAL) ```python # CORRECT env = ActionMasker(base_env, lambda e: e.action_masks()) env = Monitor(env) # WRONG — Monitor blocks action_masks() exposure env = ActionMasker(Monitor(base_env), ...) # DON'T DO THIS ``` ### 3-Phase Curriculum | Phase | Opponent | Duration | Purpose | |---|---|---|---| | 1 | Random | 500k | Learn basics | | 2 | Random + visit-count shaping | 500k | Prevent camping | | 3 | Rule-based curriculum | 1M | Generalize to structured opponents | ### Checkpointing Every 50k Steps - Local + Hub push via `HfApi.upload_file()` - Saved the project when sandboxes reset at 400k and 600k steps --- ## Technical Decisions That Failed | Decision | Why It Failed | Fix | |---|---|---| | Training in sandboxes | Process died, empty sandbox kept billing | Use HF Jobs | | `git clone` in HF Job | No auth for private repo | `snapshot_download` | | Inline 20KB script in `hf_jobs.script` | Delivery mechanism choked | Write to sandbox file first, submit path | | No session state on Hub | Lost track of progress across resets | `session_state.json` + this file | | `Monitor` inside `ActionMasker` | `get_action_masks()` failed | `ActionMasker` → `Monitor` order | --- ## Cost Awareness | Hardware | $/hr | Good For | |---|---|---| | `cpu-basic` | ~$0.05 | Writing scripts, reading files, small tests | | `t4-small` | ~$0.40 | Short dev, NOT training | | `a10g-small` | ~$1.00 | Training, but use HF Jobs not sandboxes | | `a10g-large` | ~$2.00 | Larger batch sizes, not needed for this project | **Rule**: If a task takes >30 min, it must be an HF Job. Sandboxes are for editing and quick tests only. --- ## Sandbox Policy (User Mandate) > **From this point forward, the user has mandated:** 1. **Start `cpu-basic` sandbox** at the beginning of every session 2. **Use `cpu-basic` for**: context, writing code, writing docs, editing files, planning 3. **Only switch to GPU sandbox** (`t4-small` or `a10g-small`) when performing **smoke tests** for training scripts 4. **Stop GPU sandbox IMMEDIATELY** after the smoke test completes 5. **Training tasks ONLY as HF Jobs** — never leave a training process running in a sandbox 6. **Never leave a GPU sandbox running idle** — this wastes money **Why this matters**: A GPU sandbox at $1/hr running empty for 3 hours = $3 wasted for nothing. An HF Job at the same $1/hr actually trains for every billed minute. --- ## How to Submit HF Jobs Correctly (Research Results) ### Based on `huggingface.co/docs/hub/jobs-quickstart`: **DO NOT use `git clone` for private repos.** ```python # WRONG ❌ import subprocess subprocess.run(["git", "clone", "https://huggingface.co/spaces/e-rong/til-26-ae"]) # Fails: git does not read HF_TOKEN env var # CORRECT ✅ from huggingface_hub import snapshot_download snapshot_download( repo_id="e-rong/til-26-ae", repo_type="space", local_dir="/app/til-26-ae-repo" ) # snapshot_download auto-uses HF_TOKEN from environment ``` ### Script Submission Pattern (What Actually Works) **⚠️ CRITICAL DISCOVERY: The `script` parameter in `hf_jobs` becomes a RAW HUB URL.** When you call `hf_jobs(script="/app/train.py")`, the job system does NOT upload the local file. Instead, it converts the path to: ``` https://huggingface.co/E-Rong/til-26-ae-agent/raw/main/train.py ``` and runs it via `uv run `. **This means the file MUST already exist on the Hub repo.** **The correct workflow is:** ```python from tools import write, hf_repo_files, hf_jobs # Step 1: Write script to sandbox file write(path="/app/train.py", content="...") # Step 2: ALSO upload to Hub repo so it's persisted and URL-accessible hf_repo_files( operation="upload", repo_id="E-Rong/til-26-ae-agent", path="train.py", content=open("/app/train.py").read() ) # Step 3: Submit job referencing the sandbox path # The job system will convert this to a Hub raw URL under the hood hf_jobs( operation="run", script="/app/train.py", # ← sandbox file path dependencies=["torch", "sb3-contrib", "gymnasium", "pettingzoo", "numpy", "huggingface_hub", "pygame", "omegaconf", "mazelib", "imageio", "imageio-ffmpeg", "supersuit", "psutil"], hardware_flavor="a10g-small", timeout="6h", namespace="E-Rong" # ← bills to org ) ``` **Verification from `hf_jobs inspect`:** ```bash exec uv run --with torch --with sb3-contrib ... \ https://huggingface.co/E-Rong/til-26-ae-agent/raw/main/phase2_resume.py ``` The job fetches the script from the Hub, not from the sandbox. The sandbox path is just used to derive the repo/file path. **Why this matters**: If you only write to `/app/train.py` and don't upload to the Hub, the job will fail with a 404 when it tries to fetch the URL. The sandbox resets, but the Hub URL is permanent. ### Job Persistence - Jobs run on HF infrastructure, not in your sandbox - The sandbox can die — the job keeps running - Check logs with `hf_jobs(operation="logs", job_id="...")` - Job storage is ephemeral — **push checkpoints to Hub** (not just local) --- ## Repo File Guide | File | What It Is | |---|---| | `session_state.json` | Current phase, checkpoint, mistakes log, next steps | | `AGENTS.md` | This file — lessons and context | | `docs/ae.md` | Full project documentation (research, design, results) | | `phase1_final.zip` | Complete Phase 1 model | | `phase2_ckpt_*.zip` | Phase 2 intermediate checkpoints | | `phase2_final.zip` | Phase 2 complete model (when done) | | `ae_manager.py` | Inference code for the evaluation server | | `phase2_resume.py` | Latest HF Job script (works — uses snapshot_download) | | `smoke_test.py` | 5-minute validation job — test before any real job | | `train_all_phases.py` | Original training script | --- ## Current State (Last Updated: 2026-05-14) - **Phase**: 2 (in progress, crashed) - **Latest checkpoint**: `phase2_ckpt_600352.zip` - **Target**: 1,000,352 total steps (400,000 remaining) - **Blocker**: Need to resume via HF Job with `snapshot_download` auth - **Smoke test needed**: YES — before any multi-hour job --- ## Contact / Ownership - **User**: E-Rong - **Org**: E-Rong - **Billing**: Must specify `namespace="E-Rong"` on all HF Jobs - **I am**: An ephemeral agent with no persistent memory. I depend on this file. --- *If you are a future version of me reading this: you have no memory of the above. Do not trust your internal knowledge. Read the files, check the state, test before committing compute.*