til-26-ae-agent / AGENTS.md
E-Rong's picture
Add development workflow section to AGENTS.md
227d906 verified
|
raw
history blame
6.04 kB
# AGENTS.md β€” Zero-Memory Survival Guide
> **You have no persistent memory. Read this first. Do not trust your internal knowledge.**
---
## What This Is (10 seconds)
- **Challenge**: TIL-26-AE β€” train a Bomberman agent (`agent_0`) via RL
- **Repo**: `E-Rong/til-26-ae-agent` (models, checkpoints, scripts)
- **Env source**: Private Space `e-rong/til-26-ae` (contains `til_environment/`)
- **Algorithm**: MaskablePPO + invalid-action masking + curriculum learning
---
## The 6 Unbreakable Rules
| # | Rule | Violation Cost |
|---|---|---|
| 1 | **NEVER train in sandboxes >30 min** | ~$5/hr wasted on empty, recycled containers |
| 2 | **NEVER `git clone` private repos in HF Jobs** | Job fails instantly β€” git ignores `HF_TOKEN` |
| 3 | **NEVER pass inline scripts > few KB to `hf_jobs.script`** | Delivery chokes at ~20KB |
| 4 | **ALWAYS upload script to Hub BEFORE submitting job** | Job fetches script from Hub URL, not sandbox |
| 5 | **ALWAYS update docs BEFORE starting long jobs** | Next you has zero memory; stale docs = duplicated work |
| 6 | **ALWAYS smoke-test before multi-hour jobs** | 5-min test saves hours of failed compute |
---
## Session Startup (do this now)
1. Read `session_state.json` from `E-Rong/til-26-ae-agent`
2. Check `hf_jobs ps` for running jobs
3. Check latest checkpoint on Hub (`phase*_ckpt_*.zip`)
4. Determine: current phase, remaining steps, next action
---
## Development Workflow (follow exactly)
1. **Write on `cpu-basic`** β€” code, docs, scripts, planning. Never touch GPU sandboxes for editing.
2. **Smoke-test on GPU sandbox** (`t4-small` or `a10g-small`) β€” run the script for 5-10 minutes to verify it loads the env, runs training steps, and can push a checkpoint. **Stop the GPU sandbox immediately** after pass or fail. Never leave it idle.
3. **If smoke test fails** β€” look up Hugging Face documentation (`explore_hf_docs`, `fetch_hf_docs`) or relevant docs to diagnose the issue. Iterate based on what you learn. Go back to step 1.
4. **If smoke test passes** β€” update `docs/ae.md` with current project status, update `AGENTS.md` with anything new you learned. Push both to the Hub before proceeding.
5. **Submit the real Job** (`a10g-small`, `a10g-large`, etc.). Immediately check `hf_jobs logs` to confirm it starts successfully. **Poll the job every 5 minutes** until the user interrupts you. During polling downtime, work on docs or scripts for upcoming phases, but keep checking the job.
---
## How to Submit an HF Job (the only way that works)
```python
# 1. Write to sandbox
write(path="/app/train.py", content="...")
# 2. UPLOAD TO HUB (critical β€” job fetches from Hub URL)
hf_repo_files(
operation="upload",
repo_id="E-Rong/til-26-ae-agent",
path="train.py",
content=open("/app/train.py").read()
)
# 3. Submit job
hf_jobs(
operation="run",
script="/app/train.py", # becomes: https://huggingface.co/E-Rong/til-26-ae-agent/raw/main/train.py
dependencies=["torch", "sb3-contrib", "gymnasium", "pettingzoo",
"numpy", "huggingface_hub", "pygame", "omegaconf",
"mazelib", "imageio", "imageio-ffmpeg", "supersuit", "psutil"],
hardware_flavor="a10g-small",
timeout="6h",
namespace="E-Rong"
)
```
**Why step 2 matters**: `hf_jobs inspect` reveals the job executes:
```bash
uv run ... https://huggingface.co/E-Rong/til-26-ae-agent/raw/main/train.py
```
If the file isn't on the Hub, the job 404s.
---
## How to Access the Private Env in a Job
```python
from huggingface_hub import snapshot_download
snapshot_download(
repo_id="e-rong/til-26-ae",
repo_type="space",
local_dir="/app/til-26-ae-repo"
)
# Then walk to find pyproject.toml and pip install -e .
```
`snapshot_download` auto-uses `HF_TOKEN`. `git clone` does not.
---
## Docs Update Checklist (before ANY job >30 min)
- [ ] `session_state.json` β€” phase, job_id, script name, hardware, timeout, expected completion
- [ ] `AGENTS.md` β€” any new mistakes/API gotchas learned this session
- [ ] `docs/ae.md` β€” research results, completed phase metrics
- [ ] Push all three to Hub BEFORE calling `hf_jobs`
---
## Technical Gotchas
| Gotcha | Correct | Wrong |
|---|---|---|
| **Wrapper order** | `ActionMasker(base_env)` then `Monitor(env)` | `ActionMasker(Monitor(base_env))` β€” masks break |
| **Env install** | `snapshot_download` + walk for `pyproject.toml` | `git clone` of private space |
| **Script delivery** | Upload to Hub, submit sandbox path | Inline 20KB string or sandbox-only file |
| **Auth** | `HF_TOKEN` env var (auto-injected in Jobs) | Passing token manually in git URLs |
---
## Cost Table
| Hardware | $/hr | Use For |
|---|---|---|
| `cpu-basic` | ~$0.05 | Writing code, docs, planning |
| `t4-small` | ~$0.40 | Smoke tests ONLY |
| `a10g-small` | ~$1.00 | Training via HF Jobs |
**Stop GPU sandboxes immediately after smoke tests.** An idle GPU sandbox burns $1/hr for nothing.
---
## Curriculum Summary
| Phase | Opponent | Steps | Status |
|---|---|---|---|
| 1 | Random | 500k | βœ… Complete (92% win rate) |
| 2 | Random + exploration shaping | 500k | Check `session_state.json` |
| 3 | Rule-based curriculum | 1M | Pending |
Key papers: `arxiv:2407.00662` (Pommerman curriculum + adaptive annealing), `arxiv:2006.14171` (invalid action masking).
---
## File Guide
| File | Purpose |
|---|---|
| `session_state.json` | Current phase, checkpoints, mistakes, next steps |
| `docs/ae.md` | Full research, design, results |
| `phase1_final.zip` | Phase 1 complete checkpoint |
| `phase2_ckpt_*.zip` | Phase 2 intermediates |
| `phase2_resume.py` | Working HF Job script |
| `phase3_curriculum.py` | Ready-to-submit Phase 3 script |
| `smoke_test.py` | 5-min validation |
---
## Contact
- **User**: E-Rong | **Org**: E-Rong
- **Billing namespace**: `E-Rong` (required on all `hf_jobs`)
- **You are**: An ephemeral agent with no memory. This file is your only brain.
*Read the files. Check the state. Test before committing compute. Update docs before every job.*