rogermt
/

ARC-AGI

Model card Files Files and versions

xet

Community

rogermt commited on 5 days ago

Commit

826a998

verified ·

1 Parent(s): 0e63a4d

Update TODO with TRM findings, fix outdated info

Browse files

Files changed (1) hide show

trm_solver/TODO.md +53 -0

trm_solver/TODO.md ADDED Viewed

	@@ -0,0 +1,53 @@

+# ARC-AGI TRM Solver — Roadmap
+> Focus: TRM (Tiny Recursive Model) + LLM agent routing
+> Updated: 2026-05-03
+## Current Status
+- neurogolf-solver: 52/400 tasks, LB 594.84 (separate repo)
+- TRM solver: research complete, implementation starting
+- LLM classifier: code written (classify_tasks.py)
+## Files
+| File | Purpose |
+|------|---------|
+| TRM_RESEARCH.md | Paper findings, architecture, NeuroGolf constraints |
+| classify_tasks.py | DeepSeek API classifier, runs on Kaggle |
+| composition.py | Composition solvers (transform+recolor, etc) |
+| SKILLS/kilo-agent/ | Kilo CLI reference docs (skill files, not project code) |
+## Phase 1: LLM Routing (current)
+- [x] Write classify_tasks.py (DeepSeek API classifier)
+- [x] Write composition.py (C1/C2/C3 composition solvers)
+- [ ] Test classifier on Kaggle — does DeepSeek pick correct solvers?
+- [ ] Integrate routing into neurogolf-solver solve_task()
+- [ ] Measure: how many new tasks does routing unlock?
+## Phase 2: Tiny TRM
+- [ ] Adapt official TRM code (wtfmahe/Samsung-TRM) for hidden=64
+- [ ] Change encoding: flat tokens [1,916] -> one-hot [1,10,30,30]
+- [ ] Unroll recursion (replace ACT with fixed step count)
+- [ ] Remove banned ops (Loop, Scan)
+- [ ] Train on ARC-AGI + augmentations (single A10G)
+- [ ] Validate against arc-gen
+- [ ] Export to ONNX within 1.44MB limit
+- [ ] Evaluate: how many of the 348 unsolved tasks does it crack?
+## Phase 3: Integration
+- [ ] Combine LLM routing + tiny TRM + analytical solvers
+- [ ] Full 400-task arc-gen validation
+- [ ] Kaggle submission
+## Key Constraints (NeuroGolf)
+- Input/Output: float32 [1,10,30,30] one-hot
+- Max 1.44 MB per ONNX file
+- Banned ops: Loop, Scan, NonZero, Unique, Script, Function
+- All 400 tasks count, none excluded
+- Scoring: max(1.0, 25.0 - ln(MACs + memory + params))
+- LLM agent cost does NOT count (offline during model generation)