File size: 3,496 Bytes
0e63a4d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
# TRM Research Notes (from arxiv:2510.04871 + official code wtfmahe/Samsung-TRM)

## What TRM Is

Tiny Recursive Model = 2-layer transformer that recurses on itself.
Single network does both reasoning (z_L) and answer (z_H) updates.

Full TRM specs:
- hidden=512, 8 heads, SwiGLU expansion=4
- 2 layers only, recursed T=3 outer x n=4 inner = ~15 passes
- 7M params total
- ACT (Adaptive Compute Time) for dynamic halting
- EMA (Exponential Moving Average) weight updates, 0.999
- RoPE position encoding
- Puzzle embedding: 16 learned tokens per task

Training:
- 100K epochs, lr=1e-4, batch=768
- 1000 augmentations per task (color permute + dihedral + translate)
- Data: ARC-AGI training + evaluation + ConceptARC
- 3 days on 4xH100
- Loss: stablemax cross-entropy

Results:
- 45% ARC-AGI-1 (beats DeepSeek R1 at 15.8%, o3-mini at 34.5%)
- 8% ARC-AGI-2
- 7M params vs 671B for DeepSeek R1

## Encoding

ARC grids encoded as flat sequences:
- vocab_size = 12: 0=PAD, 1=EOS, 2-11=colors 0-9
- Grid flattened to 900 tokens (30x30)
- EOS marks grid boundary
- Translational augmentation: random padding offsets

## Algorithm (simplified)

```python
def trm_forward(x, z_L, z_H, net, n=4, T=3):
    x = embed(x) + puzzle_emb
    # T-1 outer cycles without grad (improve initialization)
    for _ in range(T - 1):
        for _ in range(n):
            z_L = net(z_L, z_H + x)  # update reasoning
        z_H = net(z_H, z_L)          # update answer
    # 1 outer cycle with grad
    for _ in range(n):
        z_L = net(z_L, z_H + x)
    z_H = net(z_H, z_L)
    output = lm_head(z_H)
    return output
```

## NeuroGolf Constraints

| Constraint | Value |
|-----------|-------|
| Input/Output | float32 [1,10,30,30] one-hot |
| Max file size | 1.44 MB per ONNX |
| Banned ops | Loop, Scan, NonZero, Unique, Script, Function |
| Scoring | max(1.0, 25.0 - ln(MACs + memory + params)) |

## Full TRM Cannot Fit

7M params = ~28MB ONNX. Limit is 1.44MB. 20x over.

Loops are BANNED in NeuroGolf, so ACT and recursion must be unrolled.
Unrolling 15 passes of 2-layer transformer = 30 effective layers.

## Tiny TRM Configs That Fit

| Config | Params | ONNX Size | Recursions | Est Score/Task |
|--------|--------|-----------|------------|----------------|
| hidden=64, 4 heads, 2 layers, 4 recursions | ~42K | ~170KB | 4 | ~12.5 |
| hidden=64, 4 heads, 2 layers, 8 recursions | ~85K | ~340KB | 8 | ~11.6 |
| hidden=128, 4 heads, 2 layers, 4 recursions | ~170K | ~680KB | 4 | ~11.0 |
| hidden=128, 4 heads, 2 layers, 8 recursions | ~340K | ~1.4MB | 8 | barely fits |

## LLM Agent Integration

The LLM (DeepSeek) is used OFFLINE during model generation.
It does NOT go into the ONNX file.
It classifies tasks and routes to the correct solver.
Zero cost impact on the submitted ONNX models.

Architecture:
ARC task -> DeepSeek API (classify) -> route to solver -> build ONNX -> validate -> submit

## Official Code

- Repo: wtfmahe/Samsung-TRM on HuggingFace
- GitHub: Kilo-Org/kilocode (for Kilo CLI)
- Key files: models/recursive_reasoning/trm.py, dataset/build_arc_dataset.py, config/arch/trm.yaml
- Dataset builder handles: ARC-AGI + ConceptARC, 1000 augmentations, color/dihedral/translation

## Next Steps

1. Build tiny TRM (hidden=64) - implement from official code, adapt encoding
2. Train on ARC data (single A10G, ~few hours)
3. Evaluate on unsolved tasks (the 348 that analytical solvers can't handle)
4. Export to ONNX within NeuroGolf constraints
5. Integrate with LLM classifier for routing