AGENTS.md β Zero-Memory Survival Guide
You have no persistent memory. Read this first. Do not trust your internal knowledge.
What This Is (10 seconds)
- Challenge: TIL-26-AE β train a Bomberman agent (
agent_0) via RL - Repo:
E-Rong/til-26-ae-agent(models, checkpoints, scripts) - Env source: Private Space
e-rong/til-26-ae(containstil_environment/) - Algorithm: MaskablePPO + invalid-action masking + curriculum learning
The 6 Unbreakable Rules
| # | Rule | Violation Cost |
|---|---|---|
| 1 | NEVER train in sandboxes >30 min | ~$5/hr wasted on empty, recycled containers |
| 2 | NEVER git clone private repos in HF Jobs |
Job fails instantly β git ignores HF_TOKEN |
| 3 | NEVER pass inline scripts > few KB to hf_jobs.script |
Delivery chokes at ~20KB |
| 4 | ALWAYS upload script to Hub BEFORE submitting job | Job fetches script from Hub URL, not sandbox |
| 5 | ALWAYS update docs BEFORE starting long jobs | Next you has zero memory; stale docs = duplicated work |
| 6 | ALWAYS smoke-test before multi-hour jobs | 5-min test saves hours of failed compute |
Session Startup (do this now)
- Read
session_state.jsonfromE-Rong/til-26-ae-agent - Check
hf_jobs psfor running jobs - Check latest checkpoint on Hub (
phase*_ckpt_*.zip) - Determine: current phase, remaining steps, next action
Development Workflow (follow exactly)
Write on
cpu-basicβ code, docs, scripts, planning. Never touch GPU sandboxes for editing.Smoke-test on GPU sandbox (
t4-smallora10g-small) β run the script for 5-10 minutes to verify it loads the env, runs training steps, and can push a checkpoint. Stop the GPU sandbox immediately after pass or fail. Never leave it idle.If smoke test fails β look up Hugging Face documentation (
explore_hf_docs,fetch_hf_docs) or relevant docs to diagnose the issue. Iterate based on what you learn. Go back to step 1.If smoke test passes β update
docs/ae.mdwith current project status, updateAGENTS.mdwith anything new you learned. Push both to the Hub before proceeding.Submit the real Job (
a10g-small,a10g-large, etc.). Immediately checkhf_jobs logsto confirm it starts successfully. Poll the job every 5 minutes until the user interrupts you. During polling downtime, work on docs or scripts for upcoming phases, but keep checking the job.
How to Submit an HF Job (the only way that works)
# 1. Write to sandbox
write(path="/app/train.py", content="...")
# 2. UPLOAD TO HUB (critical β job fetches from Hub URL)
hf_repo_files(
operation="upload",
repo_id="E-Rong/til-26-ae-agent",
path="train.py",
content=open("/app/train.py").read()
)
# 3. Submit job
hf_jobs(
operation="run",
script="/app/train.py", # becomes: https://huggingface.co/E-Rong/til-26-ae-agent/raw/main/train.py
dependencies=["torch", "sb3-contrib", "gymnasium", "pettingzoo",
"numpy", "huggingface_hub", "pygame", "omegaconf",
"mazelib", "imageio", "imageio-ffmpeg", "supersuit", "psutil"],
hardware_flavor="a10g-small",
timeout="6h",
namespace="E-Rong"
)
Why step 2 matters: hf_jobs inspect reveals the job executes:
uv run ... https://huggingface.co/E-Rong/til-26-ae-agent/raw/main/train.py
If the file isn't on the Hub, the job 404s.
How to Access the Private Env in a Job
from huggingface_hub import snapshot_download
snapshot_download(
repo_id="e-rong/til-26-ae",
repo_type="space",
local_dir="/app/til-26-ae-repo"
)
# Then walk to find pyproject.toml and pip install -e .
snapshot_download auto-uses HF_TOKEN. git clone does not.
Docs Update Checklist (before ANY job >30 min)
-
session_state.jsonβ phase, job_id, script name, hardware, timeout, expected completion -
AGENTS.mdβ any new mistakes/API gotchas learned this session -
docs/ae.mdβ research results, completed phase metrics - Push all three to Hub BEFORE calling
hf_jobs
Technical Gotchas
| Gotcha | Correct | Wrong |
|---|---|---|
| Wrapper order | ActionMasker(base_env) then Monitor(env) |
ActionMasker(Monitor(base_env)) β masks break |
| Env install | snapshot_download + walk for pyproject.toml |
git clone of private space |
| Script delivery | Upload to Hub, submit sandbox path | Inline 20KB string or sandbox-only file |
| Auth | HF_TOKEN env var (auto-injected in Jobs) |
Passing token manually in git URLs |
Cost Table
| Hardware | $/hr | Use For |
|---|---|---|
cpu-basic |
~$0.05 | Writing code, docs, planning |
t4-small |
~$0.40 | Smoke tests ONLY |
a10g-small |
~$1.00 | Training via HF Jobs |
Stop GPU sandboxes immediately after smoke tests. An idle GPU sandbox burns $1/hr for nothing.
Curriculum Summary
| Phase | Opponent | Steps | Status |
|---|---|---|---|
| 1 | Random | 500k | β Complete (92% win rate) |
| 2 | Random + exploration shaping | 500k | Check session_state.json |
| 3 | Rule-based curriculum | 1M | Pending |
Key papers: arxiv:2407.00662 (Pommerman curriculum + adaptive annealing), arxiv:2006.14171 (invalid action masking).
File Guide
| File | Purpose |
|---|---|
session_state.json |
Current phase, checkpoints, mistakes, next steps |
docs/ae.md |
Full research, design, results |
phase1_final.zip |
Phase 1 complete checkpoint |
phase2_ckpt_*.zip |
Phase 2 intermediates |
phase2_resume.py |
Working HF Job script |
phase3_curriculum.py |
Ready-to-submit Phase 3 script |
smoke_test.py |
5-min validation |
Contact
- User: E-Rong | Org: E-Rong
- Billing namespace:
E-Rong(required on allhf_jobs) - You are: An ephemeral agent with no memory. This file is your only brain.
Read the files. Check the state. Test before committing compute. Update docs before every job.