AGENTS.md β Context & Lessons for Future Sessions
This file exists because sandboxes reset and I (the agent) lose all memory. READ THIS FIRST before doing anything on this project.
What This Project Is
- Challenge: TIL-26-AE (The Intelligent League β Automated Exploration)
- Game: Multi-agent Bomberman on a 16Γ16 grid
- My Role: Train
agent_0via RL to compete autonomously - Main Repo:
E-Rong/til-26-ae-agent(models, checkpoints, scripts, docs) - Space:
e-rong/til-26-ae(evaluation server withae/src/ae_manager.py) - TIL Source: Private Space
e-rong/til-26-aeβ containstil_environment/module
CRITICAL: What Killed Training & Cost Money
β NEVER USE SANDBOXES FOR TRAINING > 30 MINUTES
Sandboxes are interactive dev environments. They:
- Recycle after inactivity / timeout
- Kill processes silently
- Keep billing you while empty after the process dies
Damage done: ~$4.87 wasted across 4 sandbox sessions where training died but billing continued.
β ALWAYS USE HF JOBS FOR BATCH TRAINING
- Persistent GPU allocation
- Runs until completion (or your timeout)
- Fails visibly if something breaks (no silent empty billing)
- Must set
namespace="E-Rong"to bill the org, not the user
β NEVER git clone A PRIVATE REPO IN AN HF JOB
git clone https://huggingface.co/spaces/... fails because git does not read HF_TOKEN.
Use instead:
from huggingface_hub import snapshot_download
snapshot_download(
repo_id='e-rong/til-26-ae',
repo_type='space',
local_dir='/app/til-26-ae-repo'
)
snapshot_download auto-uses the HF_TOKEN env var.
β ALWAYS SMOKE-TEST A JOB BEFORE THE FULL RUN
Submit a 5-minute job that:
- Downloads the TIL repo
- Installs deps
- Runs 100 training steps
- Saves a dummy checkpoint to the Hub
Only after this succeeds, submit the multi-hour job.
Session Startup Checklist
Before doing anything on this project:
- Read
session_state.jsonfromE-Rong/til-26-ae-agent - Read this file (
AGENTS.md) - Check latest checkpoint on Hub (sort
phase*_ckpt_*.zipfiles) - Determine current phase and remaining steps
- If training needed: write script to sandbox, smoke-test in HF Job first
Technical Decisions That Work
MaskablePPO + Action Masking
sb3_contrib.MaskablePPOwithActionMasker- Bomberman has
action_mask: uint8[6]β walls/edges make moves illegal - Standard PPO wastes ~30-40% samples on illegal actions early on
- Papers: Huang et al. "Superstition, Imagination, and the Invalid Action Problem" (arxiv:2006.14171)
Observation Flattening
1511-dim vector from dict observation:
agent_viewcone: 7Γ5Γ25 = 875
base_viewcone: 5Γ5Γ25 = 625
direction, location[2], base_location[2], health, frozen_ticks,
base_health, team_resources, team_bombs, step = 11 scalars
Total: 1511
Wrapper Order (CRITICAL)
# CORRECT
env = ActionMasker(base_env, lambda e: e.action_masks())
env = Monitor(env)
# WRONG β Monitor blocks action_masks() exposure
env = ActionMasker(Monitor(base_env), ...) # DON'T DO THIS
3-Phase Curriculum
| Phase | Opponent | Duration | Purpose |
|---|---|---|---|
| 1 | Random | 500k | Learn basics |
| 2 | Random + visit-count shaping | 500k | Prevent camping |
| 3 | Rule-based curriculum | 1M | Generalize to structured opponents |
Checkpointing Every 50k Steps
- Local + Hub push via
HfApi.upload_file() - Saved the project when sandboxes reset at 400k and 600k steps
Technical Decisions That Failed
| Decision | Why It Failed | Fix |
|---|---|---|
| Training in sandboxes | Process died, empty sandbox kept billing | Use HF Jobs |
git clone in HF Job |
No auth for private repo | snapshot_download |
Inline 20KB script in hf_jobs.script |
Delivery mechanism choked | Write to sandbox file first, submit path |
| No session state on Hub | Lost track of progress across resets | session_state.json + this file |
Monitor inside ActionMasker |
get_action_masks() failed |
ActionMasker β Monitor order |
Cost Awareness
| Hardware | $/hr | Good For |
|---|---|---|
cpu-basic |
~$0.05 | Writing scripts, reading files, small tests |
t4-small |
~$0.40 | Short dev, NOT training |
a10g-small |
~$1.00 | Training, but use HF Jobs not sandboxes |
a10g-large |
~$2.00 | Larger batch sizes, not needed for this project |
Rule: If a task takes >30 min, it must be an HF Job. Sandboxes are for editing and quick tests only.
Sandbox Policy (User Mandate)
From this point forward, the user has mandated:
- Start
cpu-basicsandbox at the beginning of every session - Use
cpu-basicfor: context, writing code, writing docs, editing files, planning - Only switch to GPU sandbox (
t4-smallora10g-small) when performing smoke tests for training scripts - Stop GPU sandbox IMMEDIATELY after the smoke test completes
- Training tasks ONLY as HF Jobs β never leave a training process running in a sandbox
- Never leave a GPU sandbox running idle β this wastes money
Why this matters: A GPU sandbox at $1/hr running empty for 3 hours = $3 wasted for nothing. An HF Job at the same $1/hr actually trains for every billed minute.
How to Submit HF Jobs Correctly (Research Results)
Based on huggingface.co/docs/hub/jobs-quickstart:
DO NOT use git clone for private repos.
# WRONG β
import subprocess
subprocess.run(["git", "clone", "https://huggingface.co/spaces/e-rong/til-26-ae"])
# Fails: git does not read HF_TOKEN env var
# CORRECT β
from huggingface_hub import snapshot_download
snapshot_download(
repo_id="e-rong/til-26-ae",
repo_type="space",
local_dir="/app/til-26-ae-repo"
)
# snapshot_download auto-uses HF_TOKEN from environment
Script Submission Pattern (What Actually Works)
β οΈ CRITICAL DISCOVERY: The script parameter in hf_jobs becomes a RAW HUB URL.
When you call hf_jobs(script="/app/train.py"), the job system does NOT upload the local file. Instead, it converts the path to:
https://huggingface.co/E-Rong/til-26-ae-agent/raw/main/train.py
and runs it via uv run <url>. This means the file MUST already exist on the Hub repo.
The correct workflow is:
from tools import write, hf_repo_files, hf_jobs
# Step 1: Write script to sandbox file
write(path="/app/train.py", content="...")
# Step 2: ALSO upload to Hub repo so it's persisted and URL-accessible
hf_repo_files(
operation="upload",
repo_id="E-Rong/til-26-ae-agent",
path="train.py",
content=open("/app/train.py").read()
)
# Step 3: Submit job referencing the sandbox path
# The job system will convert this to a Hub raw URL under the hood
hf_jobs(
operation="run",
script="/app/train.py", # β sandbox file path
dependencies=["torch", "sb3-contrib", "gymnasium", "pettingzoo",
"numpy", "huggingface_hub", "pygame", "omegaconf",
"mazelib", "imageio", "imageio-ffmpeg", "supersuit", "psutil"],
hardware_flavor="a10g-small",
timeout="6h",
namespace="E-Rong" # β bills to org
)
Verification from hf_jobs inspect:
exec uv run --with torch --with sb3-contrib ... \
https://huggingface.co/E-Rong/til-26-ae-agent/raw/main/phase2_resume.py
The job fetches the script from the Hub, not from the sandbox. The sandbox path is just used to derive the repo/file path.
Why this matters: If you only write to /app/train.py and don't upload to the Hub, the job will fail with a 404 when it tries to fetch the URL. The sandbox resets, but the Hub URL is permanent.
Job Persistence
- Jobs run on HF infrastructure, not in your sandbox
- The sandbox can die β the job keeps running
- Check logs with
hf_jobs(operation="logs", job_id="...") - Job storage is ephemeral β push checkpoints to Hub (not just local)
Repo File Guide
| File | What It Is |
|---|---|
session_state.json |
Current phase, checkpoint, mistakes log, next steps |
AGENTS.md |
This file β lessons and context |
docs/ae.md |
Full project documentation (research, design, results) |
phase1_final.zip |
Complete Phase 1 model |
phase2_ckpt_*.zip |
Phase 2 intermediate checkpoints |
phase2_final.zip |
Phase 2 complete model (when done) |
ae_manager.py |
Inference code for the evaluation server |
phase2_resume.py |
Latest HF Job script (works β uses snapshot_download) |
smoke_test.py |
5-minute validation job β test before any real job |
train_all_phases.py |
Original training script |
Current State (Last Updated: 2026-05-14)
- Phase: 2 (in progress, crashed)
- Latest checkpoint:
phase2_ckpt_600352.zip - Target: 1,000,352 total steps (400,000 remaining)
- Blocker: Need to resume via HF Job with
snapshot_downloadauth - Smoke test needed: YES β before any multi-hour job
Contact / Ownership
- User: E-Rong
- Org: E-Rong
- Billing: Must specify
namespace="E-Rong"on all HF Jobs - I am: An ephemeral agent with no persistent memory. I depend on this file.
If you are a future version of me reading this: you have no memory of the above. Do not trust your internal knowledge. Read the files, check the state, test before committing compute.