Compact AGENTS.md into zero-memory survival guide
Browse files
AGENTS.md
CHANGED
|
@@ -1,311 +1,153 @@
|
|
| 1 |
-
# AGENTS.md β
|
| 2 |
|
| 3 |
-
>
|
| 4 |
-
> **READ THIS FIRST** before doing anything on this project.
|
| 5 |
|
| 6 |
---
|
| 7 |
|
| 8 |
-
## What This
|
| 9 |
|
| 10 |
-
- **Challenge**: TIL-26-AE
|
| 11 |
-
- **
|
| 12 |
-
- **
|
| 13 |
-
- **
|
| 14 |
-
- **Space**: `e-rong/til-26-ae` (evaluation server with `ae/src/ae_manager.py`)
|
| 15 |
-
- **TIL Source**: Private Space `e-rong/til-26-ae` β contains `til_environment/` module
|
| 16 |
|
| 17 |
---
|
| 18 |
|
| 19 |
-
##
|
| 20 |
|
| 21 |
-
#
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
### β
ALWAYS USE HF JOBS FOR BATCH TRAINING
|
| 31 |
-
|
| 32 |
-
- Persistent GPU allocation
|
| 33 |
-
- Runs until completion (or your timeout)
|
| 34 |
-
- Fails visibly if something breaks (no silent empty billing)
|
| 35 |
-
- Must set `namespace="E-Rong"` to bill the org, not the user
|
| 36 |
-
|
| 37 |
-
### β NEVER `git clone` A PRIVATE REPO IN AN HF JOB
|
| 38 |
-
|
| 39 |
-
`git clone https://huggingface.co/spaces/...` fails because git does not read `HF_TOKEN`.
|
| 40 |
-
|
| 41 |
-
**Use instead**:
|
| 42 |
-
```python
|
| 43 |
-
from huggingface_hub import snapshot_download
|
| 44 |
-
snapshot_download(
|
| 45 |
-
repo_id='e-rong/til-26-ae',
|
| 46 |
-
repo_type='space',
|
| 47 |
-
local_dir='/app/til-26-ae-repo'
|
| 48 |
-
)
|
| 49 |
-
```
|
| 50 |
-
`snapshot_download` auto-uses the `HF_TOKEN` env var.
|
| 51 |
-
|
| 52 |
-
### β
ALWAYS SMOKE-TEST A JOB BEFORE THE FULL RUN
|
| 53 |
-
|
| 54 |
-
Submit a 5-minute job that:
|
| 55 |
-
1. Downloads the TIL repo
|
| 56 |
-
2. Installs deps
|
| 57 |
-
3. Runs 100 training steps
|
| 58 |
-
4. Saves a dummy checkpoint to the Hub
|
| 59 |
-
|
| 60 |
-
Only after this succeeds, submit the multi-hour job.
|
| 61 |
|
| 62 |
---
|
| 63 |
|
| 64 |
-
## Session Startup
|
| 65 |
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
3. [ ] Check latest checkpoint on Hub (sort `phase*_ckpt_*.zip` files)
|
| 71 |
-
4. [ ] Determine current phase and remaining steps
|
| 72 |
-
5. [ ] If training needed: write script to sandbox, **smoke-test in HF Job first**
|
| 73 |
|
| 74 |
---
|
| 75 |
|
| 76 |
-
##
|
| 77 |
-
|
| 78 |
-
> **This rule prevents lost context when sessions crash or reset.**
|
| 79 |
-
|
| 80 |
-
Before submitting any multi-hour HF Job or starting any long-running compute:
|
| 81 |
-
|
| 82 |
-
1. **Update `session_state.json`** with:
|
| 83 |
-
- Current phase and status
|
| 84 |
-
- What you are about to do (job_id if resuming, script name, hardware, timeout)
|
| 85 |
-
- Why you are doing it (link to research/decisions)
|
| 86 |
-
- Expected completion time
|
| 87 |
-
|
| 88 |
-
2. **Update `AGENTS.md`** if you learned anything new:
|
| 89 |
-
- New mistakes or fixes
|
| 90 |
-
- New technical decisions with rationale
|
| 91 |
-
- Cost lessons
|
| 92 |
-
- API gotchas
|
| 93 |
-
|
| 94 |
-
3. **Update `docs/ae.md`** with research findings:
|
| 95 |
-
- New papers read (arxiv IDs, key insights)
|
| 96 |
-
- New datasets or methods discovered
|
| 97 |
-
- Results from completed phases
|
| 98 |
-
|
| 99 |
-
4. **Push all updates to the Hub** BEFORE starting the job:
|
| 100 |
-
```python
|
| 101 |
-
hf_repo_files(operation="upload", repo_id="E-Rong/til-26-ae-agent",
|
| 102 |
-
path="session_state.json", content=...)
|
| 103 |
-
```
|
| 104 |
-
|
| 105 |
-
**Why this matters**: If your session resets while a job is running, the next version of you has ZERO memory. The only way to reconstruct state is from the Hub. If docs are stale, you'll waste time (and money) redoing work or making the same mistakes.
|
| 106 |
-
|
| 107 |
-
**This rule applies to**:
|
| 108 |
-
- Any HF Job with `timeout > 30m`
|
| 109 |
-
- Any smoke test (even 5-minute ones β document what you're testing)
|
| 110 |
-
- Any evaluation run > 100 episodes
|
| 111 |
-
- Any data processing that takes > 15 minutes
|
| 112 |
|
| 113 |
-
|
| 114 |
-
|
| 115 |
-
|
| 116 |
|
| 117 |
-
#
|
| 118 |
-
|
| 119 |
-
|
| 120 |
-
-
|
| 121 |
-
|
|
|
|
|
|
|
| 122 |
|
| 123 |
-
#
|
| 124 |
-
|
| 125 |
-
|
| 126 |
-
|
| 127 |
-
|
| 128 |
-
|
| 129 |
-
|
| 130 |
-
|
|
|
|
|
|
|
|
|
|
| 131 |
```
|
| 132 |
|
| 133 |
-
|
| 134 |
-
```
|
| 135 |
-
|
| 136 |
-
env = ActionMasker(base_env, lambda e: e.action_masks())
|
| 137 |
-
env = Monitor(env)
|
| 138 |
-
|
| 139 |
-
# WRONG β Monitor blocks action_masks() exposure
|
| 140 |
-
env = ActionMasker(Monitor(base_env), ...) # DON'T DO THIS
|
| 141 |
```
|
| 142 |
-
|
| 143 |
-
### 3-Phase Curriculum
|
| 144 |
-
| Phase | Opponent | Duration | Purpose |
|
| 145 |
-
|---|---|---|---|
|
| 146 |
-
| 1 | Random | 500k | Learn basics |
|
| 147 |
-
| 2 | Random + visit-count shaping | 500k | Prevent camping |
|
| 148 |
-
| 3 | Rule-based curriculum | 1M | Generalize to structured opponents |
|
| 149 |
-
|
| 150 |
-
### Checkpointing Every 50k Steps
|
| 151 |
-
- Local + Hub push via `HfApi.upload_file()`
|
| 152 |
-
- Saved the project when sandboxes reset at 400k and 600k steps
|
| 153 |
-
|
| 154 |
-
---
|
| 155 |
-
|
| 156 |
-
## Technical Decisions That Failed
|
| 157 |
-
|
| 158 |
-
| Decision | Why It Failed | Fix |
|
| 159 |
-
|---|---|---|
|
| 160 |
-
| Training in sandboxes | Process died, empty sandbox kept billing | Use HF Jobs |
|
| 161 |
-
| `git clone` in HF Job | No auth for private repo | `snapshot_download` |
|
| 162 |
-
| Inline 20KB script in `hf_jobs.script` | Delivery mechanism choked | Write to sandbox file first, submit path |
|
| 163 |
-
| No session state on Hub | Lost track of progress across resets | `session_state.json` + this file |
|
| 164 |
-
| `Monitor` inside `ActionMasker` | `get_action_masks()` failed | `ActionMasker` β `Monitor` order |
|
| 165 |
|
| 166 |
---
|
| 167 |
|
| 168 |
-
##
|
| 169 |
-
|
| 170 |
-
| Hardware | $/hr | Good For |
|
| 171 |
-
|---|---|---|
|
| 172 |
-
| `cpu-basic` | ~$0.05 | Writing scripts, reading files, small tests |
|
| 173 |
-
| `t4-small` | ~$0.40 | Short dev, NOT training |
|
| 174 |
-
| `a10g-small` | ~$1.00 | Training, but use HF Jobs not sandboxes |
|
| 175 |
-
| `a10g-large` | ~$2.00 | Larger batch sizes, not needed for this project |
|
| 176 |
-
|
| 177 |
-
**Rule**: If a task takes >30 min, it must be an HF Job. Sandboxes are for editing and quick tests only.
|
| 178 |
-
|
| 179 |
-
---
|
| 180 |
-
|
| 181 |
-
## Sandbox Policy (User Mandate)
|
| 182 |
-
|
| 183 |
-
> **From this point forward, the user has mandated:**
|
| 184 |
-
|
| 185 |
-
1. **Start `cpu-basic` sandbox** at the beginning of every session
|
| 186 |
-
2. **Use `cpu-basic` for**: context, writing code, writing docs, editing files, planning
|
| 187 |
-
3. **Only switch to GPU sandbox** (`t4-small` or `a10g-small`) when performing **smoke tests** for training scripts
|
| 188 |
-
4. **Stop GPU sandbox IMMEDIATELY** after the smoke test completes
|
| 189 |
-
5. **Training tasks ONLY as HF Jobs** β never leave a training process running in a sandbox
|
| 190 |
-
6. **Never leave a GPU sandbox running idle** β this wastes money
|
| 191 |
-
|
| 192 |
-
**Why this matters**: A GPU sandbox at $1/hr running empty for 3 hours = $3 wasted for nothing. An HF Job at the same $1/hr actually trains for every billed minute.
|
| 193 |
-
|
| 194 |
-
---
|
| 195 |
-
|
| 196 |
-
## How to Submit HF Jobs Correctly (Research Results)
|
| 197 |
-
|
| 198 |
-
### Based on `huggingface.co/docs/hub/jobs-quickstart`:
|
| 199 |
-
|
| 200 |
-
**DO NOT use `git clone` for private repos.**
|
| 201 |
|
| 202 |
```python
|
| 203 |
-
# WRONG β
|
| 204 |
-
import subprocess
|
| 205 |
-
subprocess.run(["git", "clone", "https://huggingface.co/spaces/e-rong/til-26-ae"])
|
| 206 |
-
# Fails: git does not read HF_TOKEN env var
|
| 207 |
-
|
| 208 |
-
# CORRECT β
|
| 209 |
from huggingface_hub import snapshot_download
|
| 210 |
snapshot_download(
|
| 211 |
repo_id="e-rong/til-26-ae",
|
| 212 |
repo_type="space",
|
| 213 |
local_dir="/app/til-26-ae-repo"
|
| 214 |
)
|
| 215 |
-
#
|
| 216 |
```
|
| 217 |
|
| 218 |
-
|
| 219 |
|
| 220 |
-
|
| 221 |
|
| 222 |
-
|
| 223 |
-
```
|
| 224 |
-
https://huggingface.co/E-Rong/til-26-ae-agent/raw/main/train.py
|
| 225 |
-
```
|
| 226 |
-
and runs it via `uv run <url>`. **This means the file MUST already exist on the Hub repo.**
|
| 227 |
|
| 228 |
-
|
|
|
|
|
|
|
|
|
|
| 229 |
|
| 230 |
-
|
| 231 |
-
from tools import write, hf_repo_files, hf_jobs
|
| 232 |
|
| 233 |
-
#
|
| 234 |
-
write(path="/app/train.py", content="...")
|
| 235 |
|
| 236 |
-
|
| 237 |
-
|
| 238 |
-
|
| 239 |
-
|
| 240 |
-
|
| 241 |
-
|
| 242 |
-
)
|
| 243 |
|
| 244 |
-
|
| 245 |
-
# The job system will convert this to a Hub raw URL under the hood
|
| 246 |
-
hf_jobs(
|
| 247 |
-
operation="run",
|
| 248 |
-
script="/app/train.py", # β sandbox file path
|
| 249 |
-
dependencies=["torch", "sb3-contrib", "gymnasium", "pettingzoo",
|
| 250 |
-
"numpy", "huggingface_hub", "pygame", "omegaconf",
|
| 251 |
-
"mazelib", "imageio", "imageio-ffmpeg", "supersuit", "psutil"],
|
| 252 |
-
hardware_flavor="a10g-small",
|
| 253 |
-
timeout="6h",
|
| 254 |
-
namespace="E-Rong" # β bills to org
|
| 255 |
-
)
|
| 256 |
-
```
|
| 257 |
|
| 258 |
-
|
| 259 |
-
```bash
|
| 260 |
-
exec uv run --with torch --with sb3-contrib ... \
|
| 261 |
-
https://huggingface.co/E-Rong/til-26-ae-agent/raw/main/phase2_resume.py
|
| 262 |
-
```
|
| 263 |
-
The job fetches the script from the Hub, not from the sandbox. The sandbox path is just used to derive the repo/file path.
|
| 264 |
|
| 265 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 266 |
|
| 267 |
-
|
| 268 |
-
- Jobs run on HF infrastructure, not in your sandbox
|
| 269 |
-
- The sandbox can die β the job keeps running
|
| 270 |
-
- Check logs with `hf_jobs(operation="logs", job_id="...")`
|
| 271 |
-
- Job storage is ephemeral β **push checkpoints to Hub** (not just local)
|
| 272 |
|
| 273 |
---
|
| 274 |
|
| 275 |
-
##
|
| 276 |
|
| 277 |
-
|
|
| 278 |
-
|---|---|
|
| 279 |
-
|
|
| 280 |
-
|
|
| 281 |
-
|
|
| 282 |
-
|
| 283 |
-
|
| 284 |
-
| `phase2_final.zip` | Phase 2 complete model (when done) |
|
| 285 |
-
| `ae_manager.py` | Inference code for the evaluation server |
|
| 286 |
-
| `phase2_resume.py` | Latest HF Job script (works β uses snapshot_download) |
|
| 287 |
-
| `smoke_test.py` | 5-minute validation job β test before any real job |
|
| 288 |
-
| `train_all_phases.py` | Original training script |
|
| 289 |
|
| 290 |
---
|
| 291 |
|
| 292 |
-
##
|
| 293 |
|
| 294 |
-
|
| 295 |
-
-
|
| 296 |
-
|
| 297 |
-
|
| 298 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 299 |
|
| 300 |
---
|
| 301 |
|
| 302 |
-
## Contact
|
| 303 |
-
|
| 304 |
-
- **User**: E-Rong
|
| 305 |
-
- **Org**: E-Rong
|
| 306 |
-
- **Billing**: Must specify `namespace="E-Rong"` on all HF Jobs
|
| 307 |
-
- **I am**: An ephemeral agent with no persistent memory. I depend on this file.
|
| 308 |
|
| 309 |
-
---
|
|
|
|
|
|
|
| 310 |
|
| 311 |
-
*
|
|
|
|
| 1 |
+
# AGENTS.md β Zero-Memory Survival Guide
|
| 2 |
|
| 3 |
+
> **You have no persistent memory. Read this first. Do not trust your internal knowledge.**
|
|
|
|
| 4 |
|
| 5 |
---
|
| 6 |
|
| 7 |
+
## What This Is (10 seconds)
|
| 8 |
|
| 9 |
+
- **Challenge**: TIL-26-AE β train a Bomberman agent (`agent_0`) via RL
|
| 10 |
+
- **Repo**: `E-Rong/til-26-ae-agent` (models, checkpoints, scripts)
|
| 11 |
+
- **Env source**: Private Space `e-rong/til-26-ae` (contains `til_environment/`)
|
| 12 |
+
- **Algorithm**: MaskablePPO + invalid-action masking + curriculum learning
|
|
|
|
|
|
|
| 13 |
|
| 14 |
---
|
| 15 |
|
| 16 |
+
## The 6 Unbreakable Rules
|
| 17 |
|
| 18 |
+
| # | Rule | Violation Cost |
|
| 19 |
+
|---|---|---|
|
| 20 |
+
| 1 | **NEVER train in sandboxes >30 min** | ~$5/hr wasted on empty, recycled containers |
|
| 21 |
+
| 2 | **NEVER `git clone` private repos in HF Jobs** | Job fails instantly β git ignores `HF_TOKEN` |
|
| 22 |
+
| 3 | **NEVER pass inline scripts > few KB to `hf_jobs.script`** | Delivery chokes at ~20KB |
|
| 23 |
+
| 4 | **ALWAYS upload script to Hub BEFORE submitting job** | Job fetches script from Hub URL, not sandbox |
|
| 24 |
+
| 5 | **ALWAYS update docs BEFORE starting long jobs** | Next you has zero memory; stale docs = duplicated work |
|
| 25 |
+
| 6 | **ALWAYS smoke-test before multi-hour jobs** | 5-min test saves hours of failed compute |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
|
| 27 |
---
|
| 28 |
|
| 29 |
+
## Session Startup (do this now)
|
| 30 |
|
| 31 |
+
1. Read `session_state.json` from `E-Rong/til-26-ae-agent`
|
| 32 |
+
2. Check `hf_jobs ps` for running jobs
|
| 33 |
+
3. Check latest checkpoint on Hub (`phase*_ckpt_*.zip`)
|
| 34 |
+
4. Determine: current phase, remaining steps, next action
|
|
|
|
|
|
|
|
|
|
| 35 |
|
| 36 |
---
|
| 37 |
|
| 38 |
+
## How to Submit an HF Job (the only way that works)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
|
| 40 |
+
```python
|
| 41 |
+
# 1. Write to sandbox
|
| 42 |
+
write(path="/app/train.py", content="...")
|
| 43 |
|
| 44 |
+
# 2. UPLOAD TO HUB (critical β job fetches from Hub URL)
|
| 45 |
+
hf_repo_files(
|
| 46 |
+
operation="upload",
|
| 47 |
+
repo_id="E-Rong/til-26-ae-agent",
|
| 48 |
+
path="train.py",
|
| 49 |
+
content=open("/app/train.py").read()
|
| 50 |
+
)
|
| 51 |
|
| 52 |
+
# 3. Submit job
|
| 53 |
+
hf_jobs(
|
| 54 |
+
operation="run",
|
| 55 |
+
script="/app/train.py", # becomes: https://huggingface.co/E-Rong/til-26-ae-agent/raw/main/train.py
|
| 56 |
+
dependencies=["torch", "sb3-contrib", "gymnasium", "pettingzoo",
|
| 57 |
+
"numpy", "huggingface_hub", "pygame", "omegaconf",
|
| 58 |
+
"mazelib", "imageio", "imageio-ffmpeg", "supersuit", "psutil"],
|
| 59 |
+
hardware_flavor="a10g-small",
|
| 60 |
+
timeout="6h",
|
| 61 |
+
namespace="E-Rong"
|
| 62 |
+
)
|
| 63 |
```
|
| 64 |
|
| 65 |
+
**Why step 2 matters**: `hf_jobs inspect` reveals the job executes:
|
| 66 |
+
```bash
|
| 67 |
+
uv run ... https://huggingface.co/E-Rong/til-26-ae-agent/raw/main/train.py
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 68 |
```
|
| 69 |
+
If the file isn't on the Hub, the job 404s.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 70 |
|
| 71 |
---
|
| 72 |
|
| 73 |
+
## How to Access the Private Env in a Job
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 74 |
|
| 75 |
```python
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 76 |
from huggingface_hub import snapshot_download
|
| 77 |
snapshot_download(
|
| 78 |
repo_id="e-rong/til-26-ae",
|
| 79 |
repo_type="space",
|
| 80 |
local_dir="/app/til-26-ae-repo"
|
| 81 |
)
|
| 82 |
+
# Then walk to find pyproject.toml and pip install -e .
|
| 83 |
```
|
| 84 |
|
| 85 |
+
`snapshot_download` auto-uses `HF_TOKEN`. `git clone` does not.
|
| 86 |
|
| 87 |
+
---
|
| 88 |
|
| 89 |
+
## Docs Update Checklist (before ANY job >30 min)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 90 |
|
| 91 |
+
- [ ] `session_state.json` β phase, job_id, script name, hardware, timeout, expected completion
|
| 92 |
+
- [ ] `AGENTS.md` β any new mistakes/API gotchas learned this session
|
| 93 |
+
- [ ] `docs/ae.md` β research results, completed phase metrics
|
| 94 |
+
- [ ] Push all three to Hub BEFORE calling `hf_jobs`
|
| 95 |
|
| 96 |
+
---
|
|
|
|
| 97 |
|
| 98 |
+
## Technical Gotchas
|
|
|
|
| 99 |
|
| 100 |
+
| Gotcha | Correct | Wrong |
|
| 101 |
+
|---|---|---|
|
| 102 |
+
| **Wrapper order** | `ActionMasker(base_env)` then `Monitor(env)` | `ActionMasker(Monitor(base_env))` β masks break |
|
| 103 |
+
| **Env install** | `snapshot_download` + walk for `pyproject.toml` | `git clone` of private space |
|
| 104 |
+
| **Script delivery** | Upload to Hub, submit sandbox path | Inline 20KB string or sandbox-only file |
|
| 105 |
+
| **Auth** | `HF_TOKEN` env var (auto-injected in Jobs) | Passing token manually in git URLs |
|
|
|
|
| 106 |
|
| 107 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 108 |
|
| 109 |
+
## Cost Table
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 110 |
|
| 111 |
+
| Hardware | $/hr | Use For |
|
| 112 |
+
|---|---|---|
|
| 113 |
+
| `cpu-basic` | ~$0.05 | Writing code, docs, planning |
|
| 114 |
+
| `t4-small` | ~$0.40 | Smoke tests ONLY |
|
| 115 |
+
| `a10g-small` | ~$1.00 | Training via HF Jobs |
|
| 116 |
|
| 117 |
+
**Stop GPU sandboxes immediately after smoke tests.** An idle GPU sandbox burns $1/hr for nothing.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 118 |
|
| 119 |
---
|
| 120 |
|
| 121 |
+
## Curriculum Summary
|
| 122 |
|
| 123 |
+
| Phase | Opponent | Steps | Status |
|
| 124 |
+
|---|---|---|---|
|
| 125 |
+
| 1 | Random | 500k | β
Complete (92% win rate) |
|
| 126 |
+
| 2 | Random + exploration shaping | 500k | Check `session_state.json` |
|
| 127 |
+
| 3 | Rule-based curriculum | 1M | Pending |
|
| 128 |
+
|
| 129 |
+
Key papers: `arxiv:2407.00662` (Pommerman curriculum + adaptive annealing), `arxiv:2006.14171` (invalid action masking).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 130 |
|
| 131 |
---
|
| 132 |
|
| 133 |
+
## File Guide
|
| 134 |
|
| 135 |
+
| File | Purpose |
|
| 136 |
+
|---|---|
|
| 137 |
+
| `session_state.json` | Current phase, checkpoints, mistakes, next steps |
|
| 138 |
+
| `docs/ae.md` | Full research, design, results |
|
| 139 |
+
| `phase1_final.zip` | Phase 1 complete checkpoint |
|
| 140 |
+
| `phase2_ckpt_*.zip` | Phase 2 intermediates |
|
| 141 |
+
| `phase2_resume.py` | Working HF Job script |
|
| 142 |
+
| `phase3_curriculum.py` | Ready-to-submit Phase 3 script |
|
| 143 |
+
| `smoke_test.py` | 5-min validation |
|
| 144 |
|
| 145 |
---
|
| 146 |
|
| 147 |
+
## Contact
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 148 |
|
| 149 |
+
- **User**: E-Rong | **Org**: E-Rong
|
| 150 |
+
- **Billing namespace**: `E-Rong` (required on all `hf_jobs`)
|
| 151 |
+
- **You are**: An ephemeral agent with no memory. This file is your only brain.
|
| 152 |
|
| 153 |
+
*Read the files. Check the state. Test before committing compute. Update docs before every job.*
|