Update TODO with TRM findings, fix outdated info
Browse files- trm_solver/TODO.md +53 -0
trm_solver/TODO.md
ADDED
|
@@ -0,0 +1,53 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# ARC-AGI TRM Solver — Roadmap
|
| 2 |
+
|
| 3 |
+
> Focus: TRM (Tiny Recursive Model) + LLM agent routing
|
| 4 |
+
> Updated: 2026-05-03
|
| 5 |
+
|
| 6 |
+
## Current Status
|
| 7 |
+
|
| 8 |
+
- neurogolf-solver: 52/400 tasks, LB 594.84 (separate repo)
|
| 9 |
+
- TRM solver: research complete, implementation starting
|
| 10 |
+
- LLM classifier: code written (classify_tasks.py)
|
| 11 |
+
|
| 12 |
+
## Files
|
| 13 |
+
|
| 14 |
+
| File | Purpose |
|
| 15 |
+
|------|---------|
|
| 16 |
+
| TRM_RESEARCH.md | Paper findings, architecture, NeuroGolf constraints |
|
| 17 |
+
| classify_tasks.py | DeepSeek API classifier, runs on Kaggle |
|
| 18 |
+
| composition.py | Composition solvers (transform+recolor, etc) |
|
| 19 |
+
| SKILLS/kilo-agent/ | Kilo CLI reference docs (skill files, not project code) |
|
| 20 |
+
|
| 21 |
+
## Phase 1: LLM Routing (current)
|
| 22 |
+
|
| 23 |
+
- [x] Write classify_tasks.py (DeepSeek API classifier)
|
| 24 |
+
- [x] Write composition.py (C1/C2/C3 composition solvers)
|
| 25 |
+
- [ ] Test classifier on Kaggle — does DeepSeek pick correct solvers?
|
| 26 |
+
- [ ] Integrate routing into neurogolf-solver solve_task()
|
| 27 |
+
- [ ] Measure: how many new tasks does routing unlock?
|
| 28 |
+
|
| 29 |
+
## Phase 2: Tiny TRM
|
| 30 |
+
|
| 31 |
+
- [ ] Adapt official TRM code (wtfmahe/Samsung-TRM) for hidden=64
|
| 32 |
+
- [ ] Change encoding: flat tokens [1,916] -> one-hot [1,10,30,30]
|
| 33 |
+
- [ ] Unroll recursion (replace ACT with fixed step count)
|
| 34 |
+
- [ ] Remove banned ops (Loop, Scan)
|
| 35 |
+
- [ ] Train on ARC-AGI + augmentations (single A10G)
|
| 36 |
+
- [ ] Validate against arc-gen
|
| 37 |
+
- [ ] Export to ONNX within 1.44MB limit
|
| 38 |
+
- [ ] Evaluate: how many of the 348 unsolved tasks does it crack?
|
| 39 |
+
|
| 40 |
+
## Phase 3: Integration
|
| 41 |
+
|
| 42 |
+
- [ ] Combine LLM routing + tiny TRM + analytical solvers
|
| 43 |
+
- [ ] Full 400-task arc-gen validation
|
| 44 |
+
- [ ] Kaggle submission
|
| 45 |
+
|
| 46 |
+
## Key Constraints (NeuroGolf)
|
| 47 |
+
|
| 48 |
+
- Input/Output: float32 [1,10,30,30] one-hot
|
| 49 |
+
- Max 1.44 MB per ONNX file
|
| 50 |
+
- Banned ops: Loop, Scan, NonZero, Unique, Script, Function
|
| 51 |
+
- All 400 tasks count, none excluded
|
| 52 |
+
- Scoring: max(1.0, 25.0 - ln(MACs + memory + params))
|
| 53 |
+
- LLM agent cost does NOT count (offline during model generation)
|