| # ARC-AGI TRM Solver — Roadmap |
|
|
| > Focus: TRM (Tiny Recursive Model) + LLM agent routing |
| > Updated: 2026-05-03 |
|
|
| ## Current Status |
|
|
| - neurogolf-solver: 52/400 tasks, LB 594.84 (separate repo) |
| - TRM solver: research complete, implementation starting |
| - LLM classifier: code written (classify_tasks.py) |
| |
| ## Files |
| |
| | File | Purpose | |
| |------|---------| |
| | TRM_RESEARCH.md | Paper findings, architecture, NeuroGolf constraints | |
| | classify_tasks.py | DeepSeek API classifier, runs on Kaggle | |
| | composition.py | Composition solvers (transform+recolor, etc) | |
| | SKILLS/kilo-agent/ | Kilo CLI reference docs (skill files, not project code) | |
| |
| ## Phase 1: LLM Routing (current) |
| |
| - [x] Write classify_tasks.py (DeepSeek API classifier) |
| - [x] Write composition.py (C1/C2/C3 composition solvers) |
| - [ ] Test classifier on Kaggle — does DeepSeek pick correct solvers? |
| - [ ] Integrate routing into neurogolf-solver solve_task() |
| - [ ] Measure: how many new tasks does routing unlock? |
| |
| ## Phase 2: Tiny TRM |
| |
| - [ ] Adapt official TRM code (wtfmahe/Samsung-TRM) for hidden=64 |
| - [ ] Change encoding: flat tokens [1,916] -> one-hot [1,10,30,30] |
| - [ ] Unroll recursion (replace ACT with fixed step count) |
| - [ ] Remove banned ops (Loop, Scan) |
| - [ ] Train on ARC-AGI + augmentations (single A10G) |
| - [ ] Validate against arc-gen |
| - [ ] Export to ONNX within 1.44MB limit |
| - [ ] Evaluate: how many of the 348 unsolved tasks does it crack? |
| |
| ## Phase 3: Integration |
| |
| - [ ] Combine LLM routing + tiny TRM + analytical solvers |
| - [ ] Full 400-task arc-gen validation |
| - [ ] Kaggle submission |
| |
| ## Key Constraints (NeuroGolf) |
| |
| - Input/Output: float32 [1,10,30,30] one-hot |
| - Max 1.44 MB per ONNX file |
| - Banned ops: Loop, Scan, NonZero, Unique, Script, Function |
| - All 400 tasks count, none excluded |
| - Scoring: max(1.0, 25.0 - ln(MACs + memory + params)) |
| - LLM agent cost does NOT count (offline during model generation) |
| |