E-Rong
/

til-26-ae-agent

ml-intern

Model card Files Files and versions

xet

Community

E-Rong commited on about 14 hours ago

Commit

87436ce

verified ·

1 Parent(s): 3745a2d

Add instruction: always update docs before starting long-running tasks

Browse files

Files changed (1) hide show

AGENTS.md +39 -0

AGENTS.md CHANGED Viewed

@@ -73,6 +73,45 @@ Before doing **anything** on this project:
 ---
 ## Technical Decisions That Work
 ### MaskablePPO + Action Masking

 ---
+## 📋 ALWAYS UPDATE DOCS BEFORE STARTING LONG-RUNNING TASKS
+> **This rule prevents lost context when sessions crash or reset.**
+Before submitting any multi-hour HF Job or starting any long-running compute:
+1. **Update `session_state.json`** with:
+   - Current phase and status
+   - What you are about to do (job_id if resuming, script name, hardware, timeout)
+   - Why you are doing it (link to research/decisions)
+   - Expected completion time
+2. **Update `AGENTS.md`** if you learned anything new:
+   - New mistakes or fixes
+   - New technical decisions with rationale
+   - Cost lessons
+   - API gotchas
+3. **Update `docs/ae.md`** with research findings:
+   - New papers read (arxiv IDs, key insights)
+   - New datasets or methods discovered
+   - Results from completed phases
+4. **Push all updates to the Hub** BEFORE starting the job:
+   ```python
+   hf_repo_files(operation="upload", repo_id="E-Rong/til-26-ae-agent",
+                 path="session_state.json", content=...)
+   ```
+**Why this matters**: If your session resets while a job is running, the next version of you has ZERO memory. The only way to reconstruct state is from the Hub. If docs are stale, you'll waste time (and money) redoing work or making the same mistakes.
+**This rule applies to**:
+- Any HF Job with `timeout > 30m`
+- Any smoke test (even 5-minute ones — document what you're testing)
+- Any evaluation run > 100 episodes
+- Any data processing that takes > 15 minutes
+---
 ## Technical Decisions That Work
 ### MaskablePPO + Action Masking