Add development workflow section to AGENTS.md
Browse files
AGENTS.md
CHANGED
|
@@ -35,6 +35,20 @@
|
|
| 35 |
|
| 36 |
---
|
| 37 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38 |
## How to Submit an HF Job (the only way that works)
|
| 39 |
|
| 40 |
```python
|
|
|
|
| 35 |
|
| 36 |
---
|
| 37 |
|
| 38 |
+
## Development Workflow (follow exactly)
|
| 39 |
+
|
| 40 |
+
1. **Write on `cpu-basic`** — code, docs, scripts, planning. Never touch GPU sandboxes for editing.
|
| 41 |
+
|
| 42 |
+
2. **Smoke-test on GPU sandbox** (`t4-small` or `a10g-small`) — run the script for 5-10 minutes to verify it loads the env, runs training steps, and can push a checkpoint. **Stop the GPU sandbox immediately** after pass or fail. Never leave it idle.
|
| 43 |
+
|
| 44 |
+
3. **If smoke test fails** — look up Hugging Face documentation (`explore_hf_docs`, `fetch_hf_docs`) or relevant docs to diagnose the issue. Iterate based on what you learn. Go back to step 1.
|
| 45 |
+
|
| 46 |
+
4. **If smoke test passes** — update `docs/ae.md` with current project status, update `AGENTS.md` with anything new you learned. Push both to the Hub before proceeding.
|
| 47 |
+
|
| 48 |
+
5. **Submit the real Job** (`a10g-small`, `a10g-large`, etc.). Immediately check `hf_jobs logs` to confirm it starts successfully. **Poll the job every 5 minutes** until the user interrupts you. During polling downtime, work on docs or scripts for upcoming phases, but keep checking the job.
|
| 49 |
+
|
| 50 |
+
---
|
| 51 |
+
|
| 52 |
## How to Submit an HF Job (the only way that works)
|
| 53 |
|
| 54 |
```python
|