rogermt commited on
Commit
0e63a4d
·
verified ·
1 Parent(s): 981ef11

Add TRM research findings and architecture notes

Browse files
Files changed (1) hide show
  1. trm_solver/TRM_RESEARCH.md +103 -0
trm_solver/TRM_RESEARCH.md ADDED
@@ -0,0 +1,103 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # TRM Research Notes (from arxiv:2510.04871 + official code wtfmahe/Samsung-TRM)
2
+
3
+ ## What TRM Is
4
+
5
+ Tiny Recursive Model = 2-layer transformer that recurses on itself.
6
+ Single network does both reasoning (z_L) and answer (z_H) updates.
7
+
8
+ Full TRM specs:
9
+ - hidden=512, 8 heads, SwiGLU expansion=4
10
+ - 2 layers only, recursed T=3 outer x n=4 inner = ~15 passes
11
+ - 7M params total
12
+ - ACT (Adaptive Compute Time) for dynamic halting
13
+ - EMA (Exponential Moving Average) weight updates, 0.999
14
+ - RoPE position encoding
15
+ - Puzzle embedding: 16 learned tokens per task
16
+
17
+ Training:
18
+ - 100K epochs, lr=1e-4, batch=768
19
+ - 1000 augmentations per task (color permute + dihedral + translate)
20
+ - Data: ARC-AGI training + evaluation + ConceptARC
21
+ - 3 days on 4xH100
22
+ - Loss: stablemax cross-entropy
23
+
24
+ Results:
25
+ - 45% ARC-AGI-1 (beats DeepSeek R1 at 15.8%, o3-mini at 34.5%)
26
+ - 8% ARC-AGI-2
27
+ - 7M params vs 671B for DeepSeek R1
28
+
29
+ ## Encoding
30
+
31
+ ARC grids encoded as flat sequences:
32
+ - vocab_size = 12: 0=PAD, 1=EOS, 2-11=colors 0-9
33
+ - Grid flattened to 900 tokens (30x30)
34
+ - EOS marks grid boundary
35
+ - Translational augmentation: random padding offsets
36
+
37
+ ## Algorithm (simplified)
38
+
39
+ ```python
40
+ def trm_forward(x, z_L, z_H, net, n=4, T=3):
41
+ x = embed(x) + puzzle_emb
42
+ # T-1 outer cycles without grad (improve initialization)
43
+ for _ in range(T - 1):
44
+ for _ in range(n):
45
+ z_L = net(z_L, z_H + x) # update reasoning
46
+ z_H = net(z_H, z_L) # update answer
47
+ # 1 outer cycle with grad
48
+ for _ in range(n):
49
+ z_L = net(z_L, z_H + x)
50
+ z_H = net(z_H, z_L)
51
+ output = lm_head(z_H)
52
+ return output
53
+ ```
54
+
55
+ ## NeuroGolf Constraints
56
+
57
+ | Constraint | Value |
58
+ |-----------|-------|
59
+ | Input/Output | float32 [1,10,30,30] one-hot |
60
+ | Max file size | 1.44 MB per ONNX |
61
+ | Banned ops | Loop, Scan, NonZero, Unique, Script, Function |
62
+ | Scoring | max(1.0, 25.0 - ln(MACs + memory + params)) |
63
+
64
+ ## Full TRM Cannot Fit
65
+
66
+ 7M params = ~28MB ONNX. Limit is 1.44MB. 20x over.
67
+
68
+ Loops are BANNED in NeuroGolf, so ACT and recursion must be unrolled.
69
+ Unrolling 15 passes of 2-layer transformer = 30 effective layers.
70
+
71
+ ## Tiny TRM Configs That Fit
72
+
73
+ | Config | Params | ONNX Size | Recursions | Est Score/Task |
74
+ |--------|--------|-----------|------------|----------------|
75
+ | hidden=64, 4 heads, 2 layers, 4 recursions | ~42K | ~170KB | 4 | ~12.5 |
76
+ | hidden=64, 4 heads, 2 layers, 8 recursions | ~85K | ~340KB | 8 | ~11.6 |
77
+ | hidden=128, 4 heads, 2 layers, 4 recursions | ~170K | ~680KB | 4 | ~11.0 |
78
+ | hidden=128, 4 heads, 2 layers, 8 recursions | ~340K | ~1.4MB | 8 | barely fits |
79
+
80
+ ## LLM Agent Integration
81
+
82
+ The LLM (DeepSeek) is used OFFLINE during model generation.
83
+ It does NOT go into the ONNX file.
84
+ It classifies tasks and routes to the correct solver.
85
+ Zero cost impact on the submitted ONNX models.
86
+
87
+ Architecture:
88
+ ARC task -> DeepSeek API (classify) -> route to solver -> build ONNX -> validate -> submit
89
+
90
+ ## Official Code
91
+
92
+ - Repo: wtfmahe/Samsung-TRM on HuggingFace
93
+ - GitHub: Kilo-Org/kilocode (for Kilo CLI)
94
+ - Key files: models/recursive_reasoning/trm.py, dataset/build_arc_dataset.py, config/arch/trm.yaml
95
+ - Dataset builder handles: ARC-AGI + ConceptARC, 1000 augmentations, color/dihedral/translation
96
+
97
+ ## Next Steps
98
+
99
+ 1. Build tiny TRM (hidden=64) - implement from official code, adapt encoding
100
+ 2. Train on ARC data (single A10G, ~few hours)
101
+ 3. Evaluate on unsolved tasks (the 348 that analytical solvers can't handle)
102
+ 4. Export to ONNX within NeuroGolf constraints
103
+ 5. Integrate with LLM classifier for routing