| # TRM Research Notes (from arxiv:2510.04871 + official code wtfmahe/Samsung-TRM) |
|
|
| ## What TRM Is |
|
|
| Tiny Recursive Model = 2-layer transformer that recurses on itself. |
| Single network does both reasoning (z_L) and answer (z_H) updates. |
|
|
| Full TRM specs: |
| - hidden=512, 8 heads, SwiGLU expansion=4 |
| - 2 layers only, recursed T=3 outer x n=4 inner = ~15 passes |
| - 7M params total |
| - ACT (Adaptive Compute Time) for dynamic halting |
| - EMA (Exponential Moving Average) weight updates, 0.999 |
| - RoPE position encoding |
| - Puzzle embedding: 16 learned tokens per task |
|
|
| Training: |
| - 100K epochs, lr=1e-4, batch=768 |
| - 1000 augmentations per task (color permute + dihedral + translate) |
| - Data: ARC-AGI training + evaluation + ConceptARC |
| - 3 days on 4xH100 |
| - Loss: stablemax cross-entropy |
|
|
| Results: |
| - 45% ARC-AGI-1 (beats DeepSeek R1 at 15.8%, o3-mini at 34.5%) |
| - 8% ARC-AGI-2 |
| - 7M params vs 671B for DeepSeek R1 |
|
|
| ## Encoding |
|
|
| ARC grids encoded as flat sequences: |
| - vocab_size = 12: 0=PAD, 1=EOS, 2-11=colors 0-9 |
| - Grid flattened to 900 tokens (30x30) |
| - EOS marks grid boundary |
| - Translational augmentation: random padding offsets |
| |
| ## Algorithm (simplified) |
| |
| ```python |
| def trm_forward(x, z_L, z_H, net, n=4, T=3): |
| x = embed(x) + puzzle_emb |
| # T-1 outer cycles without grad (improve initialization) |
| for _ in range(T - 1): |
| for _ in range(n): |
| z_L = net(z_L, z_H + x) # update reasoning |
| z_H = net(z_H, z_L) # update answer |
| # 1 outer cycle with grad |
| for _ in range(n): |
| z_L = net(z_L, z_H + x) |
| z_H = net(z_H, z_L) |
| output = lm_head(z_H) |
| return output |
| ``` |
| |
| ## NeuroGolf Constraints |
|
|
| | Constraint | Value | |
| |-----------|-------| |
| | Input/Output | float32 [1,10,30,30] one-hot | |
| | Max file size | 1.44 MB per ONNX | |
| | Banned ops | Loop, Scan, NonZero, Unique, Script, Function | |
| | Scoring | max(1.0, 25.0 - ln(MACs + memory + params)) | |
|
|
| ## Full TRM Cannot Fit |
|
|
| 7M params = ~28MB ONNX. Limit is 1.44MB. 20x over. |
|
|
| Loops are BANNED in NeuroGolf, so ACT and recursion must be unrolled. |
| Unrolling 15 passes of 2-layer transformer = 30 effective layers. |
|
|
| ## Tiny TRM Configs That Fit |
|
|
| | Config | Params | ONNX Size | Recursions | Est Score/Task | |
| |--------|--------|-----------|------------|----------------| |
| | hidden=64, 4 heads, 2 layers, 4 recursions | ~42K | ~170KB | 4 | ~12.5 | |
| | hidden=64, 4 heads, 2 layers, 8 recursions | ~85K | ~340KB | 8 | ~11.6 | |
| | hidden=128, 4 heads, 2 layers, 4 recursions | ~170K | ~680KB | 4 | ~11.0 | |
| | hidden=128, 4 heads, 2 layers, 8 recursions | ~340K | ~1.4MB | 8 | barely fits | |
|
|
| ## LLM Agent Integration |
|
|
| The LLM (DeepSeek) is used OFFLINE during model generation. |
| It does NOT go into the ONNX file. |
| It classifies tasks and routes to the correct solver. |
| Zero cost impact on the submitted ONNX models. |
|
|
| Architecture: |
| ARC task -> DeepSeek API (classify) -> route to solver -> build ONNX -> validate -> submit |
|
|
| ## Official Code |
|
|
| - Repo: wtfmahe/Samsung-TRM on HuggingFace |
| - GitHub: Kilo-Org/kilocode (for Kilo CLI) |
| - Key files: models/recursive_reasoning/trm.py, dataset/build_arc_dataset.py, config/arch/trm.yaml |
| - Dataset builder handles: ARC-AGI + ConceptARC, 1000 augmentations, color/dihedral/translation |
| |
| ## Next Steps |
| |
| 1. Build tiny TRM (hidden=64) - implement from official code, adapt encoding |
| 2. Train on ARC data (single A10G, ~few hours) |
| 3. Evaluate on unsolved tasks (the 348 that analytical solvers can't handle) |
| 4. Export to ONNX within NeuroGolf constraints |
| 5. Integrate with LLM classifier for routing |
| |