rogermt commited on
Commit
826a998
·
verified ·
1 Parent(s): 0e63a4d

Update TODO with TRM findings, fix outdated info

Browse files
Files changed (1) hide show
  1. trm_solver/TODO.md +53 -0
trm_solver/TODO.md ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ARC-AGI TRM Solver — Roadmap
2
+
3
+ > Focus: TRM (Tiny Recursive Model) + LLM agent routing
4
+ > Updated: 2026-05-03
5
+
6
+ ## Current Status
7
+
8
+ - neurogolf-solver: 52/400 tasks, LB 594.84 (separate repo)
9
+ - TRM solver: research complete, implementation starting
10
+ - LLM classifier: code written (classify_tasks.py)
11
+
12
+ ## Files
13
+
14
+ | File | Purpose |
15
+ |------|---------|
16
+ | TRM_RESEARCH.md | Paper findings, architecture, NeuroGolf constraints |
17
+ | classify_tasks.py | DeepSeek API classifier, runs on Kaggle |
18
+ | composition.py | Composition solvers (transform+recolor, etc) |
19
+ | SKILLS/kilo-agent/ | Kilo CLI reference docs (skill files, not project code) |
20
+
21
+ ## Phase 1: LLM Routing (current)
22
+
23
+ - [x] Write classify_tasks.py (DeepSeek API classifier)
24
+ - [x] Write composition.py (C1/C2/C3 composition solvers)
25
+ - [ ] Test classifier on Kaggle — does DeepSeek pick correct solvers?
26
+ - [ ] Integrate routing into neurogolf-solver solve_task()
27
+ - [ ] Measure: how many new tasks does routing unlock?
28
+
29
+ ## Phase 2: Tiny TRM
30
+
31
+ - [ ] Adapt official TRM code (wtfmahe/Samsung-TRM) for hidden=64
32
+ - [ ] Change encoding: flat tokens [1,916] -> one-hot [1,10,30,30]
33
+ - [ ] Unroll recursion (replace ACT with fixed step count)
34
+ - [ ] Remove banned ops (Loop, Scan)
35
+ - [ ] Train on ARC-AGI + augmentations (single A10G)
36
+ - [ ] Validate against arc-gen
37
+ - [ ] Export to ONNX within 1.44MB limit
38
+ - [ ] Evaluate: how many of the 348 unsolved tasks does it crack?
39
+
40
+ ## Phase 3: Integration
41
+
42
+ - [ ] Combine LLM routing + tiny TRM + analytical solvers
43
+ - [ ] Full 400-task arc-gen validation
44
+ - [ ] Kaggle submission
45
+
46
+ ## Key Constraints (NeuroGolf)
47
+
48
+ - Input/Output: float32 [1,10,30,30] one-hot
49
+ - Max 1.44 MB per ONNX file
50
+ - Banned ops: Loop, Scan, NonZero, Unique, Script, Function
51
+ - All 400 tasks count, none excluded
52
+ - Scoring: max(1.0, 25.0 - ln(MACs + memory + params))
53
+ - LLM agent cost does NOT count (offline during model generation)