til-26-ae-agent / AGENTS.md
E-Rong's picture
Update AGENTS.md: document how hf_jobs script parameter actually works (converts to raw Hub URL)
3745a2d verified
|
raw
history blame
9.49 kB

AGENTS.md β€” Context & Lessons for Future Sessions

This file exists because sandboxes reset and I (the agent) lose all memory. READ THIS FIRST before doing anything on this project.


What This Project Is

  • Challenge: TIL-26-AE (The Intelligent League β€” Automated Exploration)
  • Game: Multi-agent Bomberman on a 16Γ—16 grid
  • My Role: Train agent_0 via RL to compete autonomously
  • Main Repo: E-Rong/til-26-ae-agent (models, checkpoints, scripts, docs)
  • Space: e-rong/til-26-ae (evaluation server with ae/src/ae_manager.py)
  • TIL Source: Private Space e-rong/til-26-ae β€” contains til_environment/ module

CRITICAL: What Killed Training & Cost Money

❌ NEVER USE SANDBOXES FOR TRAINING > 30 MINUTES

Sandboxes are interactive dev environments. They:

  • Recycle after inactivity / timeout
  • Kill processes silently
  • Keep billing you while empty after the process dies

Damage done: ~$4.87 wasted across 4 sandbox sessions where training died but billing continued.

βœ… ALWAYS USE HF JOBS FOR BATCH TRAINING

  • Persistent GPU allocation
  • Runs until completion (or your timeout)
  • Fails visibly if something breaks (no silent empty billing)
  • Must set namespace="E-Rong" to bill the org, not the user

❌ NEVER git clone A PRIVATE REPO IN AN HF JOB

git clone https://huggingface.co/spaces/... fails because git does not read HF_TOKEN.

Use instead:

from huggingface_hub import snapshot_download
snapshot_download(
    repo_id='e-rong/til-26-ae',
    repo_type='space',
    local_dir='/app/til-26-ae-repo'
)

snapshot_download auto-uses the HF_TOKEN env var.

βœ… ALWAYS SMOKE-TEST A JOB BEFORE THE FULL RUN

Submit a 5-minute job that:

  1. Downloads the TIL repo
  2. Installs deps
  3. Runs 100 training steps
  4. Saves a dummy checkpoint to the Hub

Only after this succeeds, submit the multi-hour job.


Session Startup Checklist

Before doing anything on this project:

  1. Read session_state.json from E-Rong/til-26-ae-agent
  2. Read this file (AGENTS.md)
  3. Check latest checkpoint on Hub (sort phase*_ckpt_*.zip files)
  4. Determine current phase and remaining steps
  5. If training needed: write script to sandbox, smoke-test in HF Job first

Technical Decisions That Work

MaskablePPO + Action Masking

  • sb3_contrib.MaskablePPO with ActionMasker
  • Bomberman has action_mask: uint8[6] β€” walls/edges make moves illegal
  • Standard PPO wastes ~30-40% samples on illegal actions early on
  • Papers: Huang et al. "Superstition, Imagination, and the Invalid Action Problem" (arxiv:2006.14171)

Observation Flattening

1511-dim vector from dict observation:

agent_viewcone:  7Γ—5Γ—25 = 875
base_viewcone:   5Γ—5Γ—25 = 625
direction, location[2], base_location[2], health, frozen_ticks,
base_health, team_resources, team_bombs, step = 11 scalars
Total: 1511

Wrapper Order (CRITICAL)

# CORRECT
env = ActionMasker(base_env, lambda e: e.action_masks())
env = Monitor(env)

# WRONG β€” Monitor blocks action_masks() exposure
env = ActionMasker(Monitor(base_env), ...)  # DON'T DO THIS

3-Phase Curriculum

Phase Opponent Duration Purpose
1 Random 500k Learn basics
2 Random + visit-count shaping 500k Prevent camping
3 Rule-based curriculum 1M Generalize to structured opponents

Checkpointing Every 50k Steps

  • Local + Hub push via HfApi.upload_file()
  • Saved the project when sandboxes reset at 400k and 600k steps

Technical Decisions That Failed

Decision Why It Failed Fix
Training in sandboxes Process died, empty sandbox kept billing Use HF Jobs
git clone in HF Job No auth for private repo snapshot_download
Inline 20KB script in hf_jobs.script Delivery mechanism choked Write to sandbox file first, submit path
No session state on Hub Lost track of progress across resets session_state.json + this file
Monitor inside ActionMasker get_action_masks() failed ActionMasker β†’ Monitor order

Cost Awareness

Hardware $/hr Good For
cpu-basic ~$0.05 Writing scripts, reading files, small tests
t4-small ~$0.40 Short dev, NOT training
a10g-small ~$1.00 Training, but use HF Jobs not sandboxes
a10g-large ~$2.00 Larger batch sizes, not needed for this project

Rule: If a task takes >30 min, it must be an HF Job. Sandboxes are for editing and quick tests only.


Sandbox Policy (User Mandate)

From this point forward, the user has mandated:

  1. Start cpu-basic sandbox at the beginning of every session
  2. Use cpu-basic for: context, writing code, writing docs, editing files, planning
  3. Only switch to GPU sandbox (t4-small or a10g-small) when performing smoke tests for training scripts
  4. Stop GPU sandbox IMMEDIATELY after the smoke test completes
  5. Training tasks ONLY as HF Jobs β€” never leave a training process running in a sandbox
  6. Never leave a GPU sandbox running idle β€” this wastes money

Why this matters: A GPU sandbox at $1/hr running empty for 3 hours = $3 wasted for nothing. An HF Job at the same $1/hr actually trains for every billed minute.


How to Submit HF Jobs Correctly (Research Results)

Based on huggingface.co/docs/hub/jobs-quickstart:

DO NOT use git clone for private repos.

# WRONG ❌
import subprocess
subprocess.run(["git", "clone", "https://huggingface.co/spaces/e-rong/til-26-ae"])
# Fails: git does not read HF_TOKEN env var

# CORRECT βœ…
from huggingface_hub import snapshot_download
snapshot_download(
    repo_id="e-rong/til-26-ae",
    repo_type="space",
    local_dir="/app/til-26-ae-repo"
)
# snapshot_download auto-uses HF_TOKEN from environment

Script Submission Pattern (What Actually Works)

⚠️ CRITICAL DISCOVERY: The script parameter in hf_jobs becomes a RAW HUB URL.

When you call hf_jobs(script="/app/train.py"), the job system does NOT upload the local file. Instead, it converts the path to:

https://huggingface.co/E-Rong/til-26-ae-agent/raw/main/train.py

and runs it via uv run <url>. This means the file MUST already exist on the Hub repo.

The correct workflow is:

from tools import write, hf_repo_files, hf_jobs

# Step 1: Write script to sandbox file
write(path="/app/train.py", content="...")

# Step 2: ALSO upload to Hub repo so it's persisted and URL-accessible
hf_repo_files(
    operation="upload",
    repo_id="E-Rong/til-26-ae-agent",
    path="train.py",
    content=open("/app/train.py").read()
)

# Step 3: Submit job referencing the sandbox path
# The job system will convert this to a Hub raw URL under the hood
hf_jobs(
    operation="run",
    script="/app/train.py",           # ← sandbox file path
    dependencies=["torch", "sb3-contrib", "gymnasium", "pettingzoo",
                  "numpy", "huggingface_hub", "pygame", "omegaconf",
                  "mazelib", "imageio", "imageio-ffmpeg", "supersuit", "psutil"],
    hardware_flavor="a10g-small",
    timeout="6h",
    namespace="E-Rong"              # ← bills to org
)

Verification from hf_jobs inspect:

exec uv run --with torch --with sb3-contrib ... \
    https://huggingface.co/E-Rong/til-26-ae-agent/raw/main/phase2_resume.py

The job fetches the script from the Hub, not from the sandbox. The sandbox path is just used to derive the repo/file path.

Why this matters: If you only write to /app/train.py and don't upload to the Hub, the job will fail with a 404 when it tries to fetch the URL. The sandbox resets, but the Hub URL is permanent.

Job Persistence

  • Jobs run on HF infrastructure, not in your sandbox
  • The sandbox can die β€” the job keeps running
  • Check logs with hf_jobs(operation="logs", job_id="...")
  • Job storage is ephemeral β€” push checkpoints to Hub (not just local)

Repo File Guide

File What It Is
session_state.json Current phase, checkpoint, mistakes log, next steps
AGENTS.md This file β€” lessons and context
docs/ae.md Full project documentation (research, design, results)
phase1_final.zip Complete Phase 1 model
phase2_ckpt_*.zip Phase 2 intermediate checkpoints
phase2_final.zip Phase 2 complete model (when done)
ae_manager.py Inference code for the evaluation server
phase2_resume.py Latest HF Job script (works β€” uses snapshot_download)
smoke_test.py 5-minute validation job β€” test before any real job
train_all_phases.py Original training script

Current State (Last Updated: 2026-05-14)

  • Phase: 2 (in progress, crashed)
  • Latest checkpoint: phase2_ckpt_600352.zip
  • Target: 1,000,352 total steps (400,000 remaining)
  • Blocker: Need to resume via HF Job with snapshot_download auth
  • Smoke test needed: YES β€” before any multi-hour job

Contact / Ownership

  • User: E-Rong
  • Org: E-Rong
  • Billing: Must specify namespace="E-Rong" on all HF Jobs
  • I am: An ephemeral agent with no persistent memory. I depend on this file.

If you are a future version of me reading this: you have no memory of the above. Do not trust your internal knowledge. Read the files, check the state, test before committing compute.