ARC-AGI / trm_solver /TODO.md
rogermt's picture
Update TODO with TRM findings, fix outdated info
826a998 verified

ARC-AGI TRM Solver — Roadmap

Focus: TRM (Tiny Recursive Model) + LLM agent routing Updated: 2026-05-03

Current Status

  • neurogolf-solver: 52/400 tasks, LB 594.84 (separate repo)
  • TRM solver: research complete, implementation starting
  • LLM classifier: code written (classify_tasks.py)

Files

File Purpose
TRM_RESEARCH.md Paper findings, architecture, NeuroGolf constraints
classify_tasks.py DeepSeek API classifier, runs on Kaggle
composition.py Composition solvers (transform+recolor, etc)
SKILLS/kilo-agent/ Kilo CLI reference docs (skill files, not project code)

Phase 1: LLM Routing (current)

  • Write classify_tasks.py (DeepSeek API classifier)
  • Write composition.py (C1/C2/C3 composition solvers)
  • Test classifier on Kaggle — does DeepSeek pick correct solvers?
  • Integrate routing into neurogolf-solver solve_task()
  • Measure: how many new tasks does routing unlock?

Phase 2: Tiny TRM

  • Adapt official TRM code (wtfmahe/Samsung-TRM) for hidden=64
  • Change encoding: flat tokens [1,916] -> one-hot [1,10,30,30]
  • Unroll recursion (replace ACT with fixed step count)
  • Remove banned ops (Loop, Scan)
  • Train on ARC-AGI + augmentations (single A10G)
  • Validate against arc-gen
  • Export to ONNX within 1.44MB limit
  • Evaluate: how many of the 348 unsolved tasks does it crack?

Phase 3: Integration

  • Combine LLM routing + tiny TRM + analytical solvers
  • Full 400-task arc-gen validation
  • Kaggle submission

Key Constraints (NeuroGolf)

  • Input/Output: float32 [1,10,30,30] one-hot
  • Max 1.44 MB per ONNX file
  • Banned ops: Loop, Scan, NonZero, Unique, Script, Function
  • All 400 tasks count, none excluded
  • Scoring: max(1.0, 25.0 - ln(MACs + memory + params))
  • LLM agent cost does NOT count (offline during model generation)