Add TRM research findings and architecture notes
Browse files- trm_solver/TRM_RESEARCH.md +103 -0
trm_solver/TRM_RESEARCH.md
ADDED
|
@@ -0,0 +1,103 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# TRM Research Notes (from arxiv:2510.04871 + official code wtfmahe/Samsung-TRM)
|
| 2 |
+
|
| 3 |
+
## What TRM Is
|
| 4 |
+
|
| 5 |
+
Tiny Recursive Model = 2-layer transformer that recurses on itself.
|
| 6 |
+
Single network does both reasoning (z_L) and answer (z_H) updates.
|
| 7 |
+
|
| 8 |
+
Full TRM specs:
|
| 9 |
+
- hidden=512, 8 heads, SwiGLU expansion=4
|
| 10 |
+
- 2 layers only, recursed T=3 outer x n=4 inner = ~15 passes
|
| 11 |
+
- 7M params total
|
| 12 |
+
- ACT (Adaptive Compute Time) for dynamic halting
|
| 13 |
+
- EMA (Exponential Moving Average) weight updates, 0.999
|
| 14 |
+
- RoPE position encoding
|
| 15 |
+
- Puzzle embedding: 16 learned tokens per task
|
| 16 |
+
|
| 17 |
+
Training:
|
| 18 |
+
- 100K epochs, lr=1e-4, batch=768
|
| 19 |
+
- 1000 augmentations per task (color permute + dihedral + translate)
|
| 20 |
+
- Data: ARC-AGI training + evaluation + ConceptARC
|
| 21 |
+
- 3 days on 4xH100
|
| 22 |
+
- Loss: stablemax cross-entropy
|
| 23 |
+
|
| 24 |
+
Results:
|
| 25 |
+
- 45% ARC-AGI-1 (beats DeepSeek R1 at 15.8%, o3-mini at 34.5%)
|
| 26 |
+
- 8% ARC-AGI-2
|
| 27 |
+
- 7M params vs 671B for DeepSeek R1
|
| 28 |
+
|
| 29 |
+
## Encoding
|
| 30 |
+
|
| 31 |
+
ARC grids encoded as flat sequences:
|
| 32 |
+
- vocab_size = 12: 0=PAD, 1=EOS, 2-11=colors 0-9
|
| 33 |
+
- Grid flattened to 900 tokens (30x30)
|
| 34 |
+
- EOS marks grid boundary
|
| 35 |
+
- Translational augmentation: random padding offsets
|
| 36 |
+
|
| 37 |
+
## Algorithm (simplified)
|
| 38 |
+
|
| 39 |
+
```python
|
| 40 |
+
def trm_forward(x, z_L, z_H, net, n=4, T=3):
|
| 41 |
+
x = embed(x) + puzzle_emb
|
| 42 |
+
# T-1 outer cycles without grad (improve initialization)
|
| 43 |
+
for _ in range(T - 1):
|
| 44 |
+
for _ in range(n):
|
| 45 |
+
z_L = net(z_L, z_H + x) # update reasoning
|
| 46 |
+
z_H = net(z_H, z_L) # update answer
|
| 47 |
+
# 1 outer cycle with grad
|
| 48 |
+
for _ in range(n):
|
| 49 |
+
z_L = net(z_L, z_H + x)
|
| 50 |
+
z_H = net(z_H, z_L)
|
| 51 |
+
output = lm_head(z_H)
|
| 52 |
+
return output
|
| 53 |
+
```
|
| 54 |
+
|
| 55 |
+
## NeuroGolf Constraints
|
| 56 |
+
|
| 57 |
+
| Constraint | Value |
|
| 58 |
+
|-----------|-------|
|
| 59 |
+
| Input/Output | float32 [1,10,30,30] one-hot |
|
| 60 |
+
| Max file size | 1.44 MB per ONNX |
|
| 61 |
+
| Banned ops | Loop, Scan, NonZero, Unique, Script, Function |
|
| 62 |
+
| Scoring | max(1.0, 25.0 - ln(MACs + memory + params)) |
|
| 63 |
+
|
| 64 |
+
## Full TRM Cannot Fit
|
| 65 |
+
|
| 66 |
+
7M params = ~28MB ONNX. Limit is 1.44MB. 20x over.
|
| 67 |
+
|
| 68 |
+
Loops are BANNED in NeuroGolf, so ACT and recursion must be unrolled.
|
| 69 |
+
Unrolling 15 passes of 2-layer transformer = 30 effective layers.
|
| 70 |
+
|
| 71 |
+
## Tiny TRM Configs That Fit
|
| 72 |
+
|
| 73 |
+
| Config | Params | ONNX Size | Recursions | Est Score/Task |
|
| 74 |
+
|--------|--------|-----------|------------|----------------|
|
| 75 |
+
| hidden=64, 4 heads, 2 layers, 4 recursions | ~42K | ~170KB | 4 | ~12.5 |
|
| 76 |
+
| hidden=64, 4 heads, 2 layers, 8 recursions | ~85K | ~340KB | 8 | ~11.6 |
|
| 77 |
+
| hidden=128, 4 heads, 2 layers, 4 recursions | ~170K | ~680KB | 4 | ~11.0 |
|
| 78 |
+
| hidden=128, 4 heads, 2 layers, 8 recursions | ~340K | ~1.4MB | 8 | barely fits |
|
| 79 |
+
|
| 80 |
+
## LLM Agent Integration
|
| 81 |
+
|
| 82 |
+
The LLM (DeepSeek) is used OFFLINE during model generation.
|
| 83 |
+
It does NOT go into the ONNX file.
|
| 84 |
+
It classifies tasks and routes to the correct solver.
|
| 85 |
+
Zero cost impact on the submitted ONNX models.
|
| 86 |
+
|
| 87 |
+
Architecture:
|
| 88 |
+
ARC task -> DeepSeek API (classify) -> route to solver -> build ONNX -> validate -> submit
|
| 89 |
+
|
| 90 |
+
## Official Code
|
| 91 |
+
|
| 92 |
+
- Repo: wtfmahe/Samsung-TRM on HuggingFace
|
| 93 |
+
- GitHub: Kilo-Org/kilocode (for Kilo CLI)
|
| 94 |
+
- Key files: models/recursive_reasoning/trm.py, dataset/build_arc_dataset.py, config/arch/trm.yaml
|
| 95 |
+
- Dataset builder handles: ARC-AGI + ConceptARC, 1000 augmentations, color/dihedral/translation
|
| 96 |
+
|
| 97 |
+
## Next Steps
|
| 98 |
+
|
| 99 |
+
1. Build tiny TRM (hidden=64) - implement from official code, adapt encoding
|
| 100 |
+
2. Train on ARC data (single A10G, ~few hours)
|
| 101 |
+
3. Evaluate on unsolved tasks (the 348 that analytical solvers can't handle)
|
| 102 |
+
4. Export to ONNX within NeuroGolf constraints
|
| 103 |
+
5. Integrate with LLM classifier for routing
|