til-26-ae-agent / AGENTS.md

E-Rong

Update AGENTS.md: document how hf_jobs script parameter actually works (converts to raw Hub URL)

3745a2d verified 1 day ago

preview code

raw

history blame

9.49 kB

AGENTS.md — Context & Lessons for Future Sessions

This file exists because sandboxes reset and I (the agent) lose all memory. READ THIS FIRST before doing anything on this project.

What This Project Is

Challenge: TIL-26-AE (The Intelligent League — Automated Exploration)
Game: Multi-agent Bomberman on a 16×16 grid
My Role: Train agent_0 via RL to compete autonomously
Main Repo: E-Rong/til-26-ae-agent (models, checkpoints, scripts, docs)
Space: e-rong/til-26-ae (evaluation server with ae/src/ae_manager.py)
TIL Source: Private Space e-rong/til-26-ae — contains til_environment/ module

CRITICAL: What Killed Training & Cost Money

❌ NEVER USE SANDBOXES FOR TRAINING > 30 MINUTES

Sandboxes are interactive dev environments. They:

Recycle after inactivity / timeout
Kill processes silently
Keep billing you while empty after the process dies

Damage done: ~$4.87 wasted across 4 sandbox sessions where training died but billing continued.

✅ ALWAYS USE HF JOBS FOR BATCH TRAINING

Persistent GPU allocation
Runs until completion (or your timeout)
Fails visibly if something breaks (no silent empty billing)
Must set namespace="E-Rong" to bill the org, not the user

❌ NEVER `git clone` A PRIVATE REPO IN AN HF JOB

git clone https://huggingface.co/spaces/... fails because git does not read HF_TOKEN.

Use instead:

from huggingface_hub import snapshot_download
snapshot_download(
    repo_id='e-rong/til-26-ae',
    repo_type='space',
    local_dir='/app/til-26-ae-repo'
)

snapshot_download auto-uses the HF_TOKEN env var.

✅ ALWAYS SMOKE-TEST A JOB BEFORE THE FULL RUN

Submit a 5-minute job that:

Downloads the TIL repo
Installs deps
Runs 100 training steps
Saves a dummy checkpoint to the Hub

Only after this succeeds, submit the multi-hour job.

Session Startup Checklist

Before doing anything on this project:

Read session_state.json from E-Rong/til-26-ae-agent
Read this file (AGENTS.md)
Check latest checkpoint on Hub (sort phase*_ckpt_*.zip files)
Determine current phase and remaining steps
If training needed: write script to sandbox, smoke-test in HF Job first

Technical Decisions That Work

MaskablePPO + Action Masking

sb3_contrib.MaskablePPO with ActionMasker
Bomberman has action_mask: uint8[6] — walls/edges make moves illegal
Standard PPO wastes ~30-40% samples on illegal actions early on
Papers: Huang et al. "Superstition, Imagination, and the Invalid Action Problem" (arxiv:2006.14171)

Observation Flattening

1511-dim vector from dict observation:

agent_viewcone:  7×5×25 = 875
base_viewcone:   5×5×25 = 625
direction, location[2], base_location[2], health, frozen_ticks,
base_health, team_resources, team_bombs, step = 11 scalars
Total: 1511

Wrapper Order (CRITICAL)

# CORRECT
env = ActionMasker(base_env, lambda e: e.action_masks())
env = Monitor(env)

# WRONG — Monitor blocks action_masks() exposure
env = ActionMasker(Monitor(base_env), ...)  # DON'T DO THIS

3-Phase Curriculum

Phase	Opponent	Duration	Purpose
1	Random	500k	Learn basics
2	Random + visit-count shaping	500k	Prevent camping
3	Rule-based curriculum	1M	Generalize to structured opponents

Checkpointing Every 50k Steps

Local + Hub push via HfApi.upload_file()
Saved the project when sandboxes reset at 400k and 600k steps

Technical Decisions That Failed

Decision	Why It Failed	Fix
Training in sandboxes	Process died, empty sandbox kept billing	Use HF Jobs
`git clone` in HF Job	No auth for private repo	`snapshot_download`
Inline 20KB script in `hf_jobs.script`	Delivery mechanism choked	Write to sandbox file first, submit path
No session state on Hub	Lost track of progress across resets	`session_state.json` + this file
`Monitor` inside `ActionMasker`	`get_action_masks()` failed	`ActionMasker` → `Monitor` order

Cost Awareness

Hardware	$/hr	Good For
`cpu-basic`	~$0.05	Writing scripts, reading files, small tests
`t4-small`	~$0.40	Short dev, NOT training
`a10g-small`	~$1.00	Training, but use HF Jobs not sandboxes
`a10g-large`	~$2.00	Larger batch sizes, not needed for this project

Rule: If a task takes >30 min, it must be an HF Job. Sandboxes are for editing and quick tests only.

Sandbox Policy (User Mandate)

From this point forward, the user has mandated:

Start cpu-basic sandbox at the beginning of every session
Use cpu-basic for: context, writing code, writing docs, editing files, planning
Only switch to GPU sandbox (t4-small or a10g-small) when performing smoke tests for training scripts
Stop GPU sandbox IMMEDIATELY after the smoke test completes
Training tasks ONLY as HF Jobs — never leave a training process running in a sandbox
Never leave a GPU sandbox running idle — this wastes money

Why this matters: A GPU sandbox at $1/hr running empty for 3 hours = $3 wasted for nothing. An HF Job at the same $1/hr actually trains for every billed minute.

How to Submit HF Jobs Correctly (Research Results)

Based on `huggingface.co/docs/hub/jobs-quickstart`:

DO NOT use git clone for private repos.

# WRONG ❌
import subprocess
subprocess.run(["git", "clone", "https://huggingface.co/spaces/e-rong/til-26-ae"])
# Fails: git does not read HF_TOKEN env var

# CORRECT ✅
from huggingface_hub import snapshot_download
snapshot_download(
    repo_id="e-rong/til-26-ae",
    repo_type="space",
    local_dir="/app/til-26-ae-repo"
)
# snapshot_download auto-uses HF_TOKEN from environment

Script Submission Pattern (What Actually Works)

⚠️ CRITICAL DISCOVERY: The script parameter in hf_jobs becomes a RAW HUB URL.

When you call hf_jobs(script="/app/train.py"), the job system does NOT upload the local file. Instead, it converts the path to:

https://huggingface.co/E-Rong/til-26-ae-agent/raw/main/train.py

and runs it via uv run <url>. This means the file MUST already exist on the Hub repo.

The correct workflow is:

from tools import write, hf_repo_files, hf_jobs

# Step 1: Write script to sandbox file
write(path="/app/train.py", content="...")

# Step 2: ALSO upload to Hub repo so it's persisted and URL-accessible
hf_repo_files(
    operation="upload",
    repo_id="E-Rong/til-26-ae-agent",
    path="train.py",
    content=open("/app/train.py").read()
)

# Step 3: Submit job referencing the sandbox path
# The job system will convert this to a Hub raw URL under the hood
hf_jobs(
    operation="run",
    script="/app/train.py",           # ← sandbox file path
    dependencies=["torch", "sb3-contrib", "gymnasium", "pettingzoo",
                  "numpy", "huggingface_hub", "pygame", "omegaconf",
                  "mazelib", "imageio", "imageio-ffmpeg", "supersuit", "psutil"],
    hardware_flavor="a10g-small",
    timeout="6h",
    namespace="E-Rong"              # ← bills to org
)

Verification from hf_jobs inspect:

exec uv run --with torch --with sb3-contrib ... \
    https://huggingface.co/E-Rong/til-26-ae-agent/raw/main/phase2_resume.py

The job fetches the script from the Hub, not from the sandbox. The sandbox path is just used to derive the repo/file path.

Why this matters: If you only write to /app/train.py and don't upload to the Hub, the job will fail with a 404 when it tries to fetch the URL. The sandbox resets, but the Hub URL is permanent.

Job Persistence

Jobs run on HF infrastructure, not in your sandbox
The sandbox can die — the job keeps running
Check logs with hf_jobs(operation="logs", job_id="...")
Job storage is ephemeral — push checkpoints to Hub (not just local)

Repo File Guide

File	What It Is
`session_state.json`	Current phase, checkpoint, mistakes log, next steps
`AGENTS.md`	This file — lessons and context
`docs/ae.md`	Full project documentation (research, design, results)
`phase1_final.zip`	Complete Phase 1 model
`phase2_ckpt_*.zip`	Phase 2 intermediate checkpoints
`phase2_final.zip`	Phase 2 complete model (when done)
`ae_manager.py`	Inference code for the evaluation server
`phase2_resume.py`	Latest HF Job script (works — uses snapshot_download)
`smoke_test.py`	5-minute validation job — test before any real job
`train_all_phases.py`	Original training script

Current State (Last Updated: 2026-05-14)

Phase: 2 (in progress, crashed)
Latest checkpoint: phase2_ckpt_600352.zip
Target: 1,000,352 total steps (400,000 remaining)
Blocker: Need to resume via HF Job with snapshot_download auth
Smoke test needed: YES — before any multi-hour job

Contact / Ownership

User: E-Rong
Org: E-Rong
Billing: Must specify namespace="E-Rong" on all HF Jobs
I am: An ephemeral agent with no persistent memory. I depend on this file.

If you are a future version of me reading this: you have no memory of the above. Do not trust your internal knowledge. Read the files, check the state, test before committing compute.