TRM Research Notes (from arxiv:2510.04871 + official code wtfmahe/Samsung-TRM)

What TRM Is

Tiny Recursive Model = 2-layer transformer that recurses on itself. Single network does both reasoning (z_L) and answer (z_H) updates.

Full TRM specs:

hidden=512, 8 heads, SwiGLU expansion=4
2 layers only, recursed T=3 outer x n=4 inner = ~15 passes
7M params total
ACT (Adaptive Compute Time) for dynamic halting
EMA (Exponential Moving Average) weight updates, 0.999
RoPE position encoding
Puzzle embedding: 16 learned tokens per task

Training:

100K epochs, lr=1e-4, batch=768
1000 augmentations per task (color permute + dihedral + translate)
Data: ARC-AGI training + evaluation + ConceptARC
3 days on 4xH100
Loss: stablemax cross-entropy

Results:

45% ARC-AGI-1 (beats DeepSeek R1 at 15.8%, o3-mini at 34.5%)
8% ARC-AGI-2
7M params vs 671B for DeepSeek R1

Encoding

ARC grids encoded as flat sequences:

vocab_size = 12: 0=PAD, 1=EOS, 2-11=colors 0-9
Grid flattened to 900 tokens (30x30)
EOS marks grid boundary
Translational augmentation: random padding offsets

Algorithm (simplified)

def trm_forward(x, z_L, z_H, net, n=4, T=3):
    x = embed(x) + puzzle_emb
    # T-1 outer cycles without grad (improve initialization)
    for _ in range(T - 1):
        for _ in range(n):
            z_L = net(z_L, z_H + x)  # update reasoning
        z_H = net(z_H, z_L)          # update answer
    # 1 outer cycle with grad
    for _ in range(n):
        z_L = net(z_L, z_H + x)
    z_H = net(z_H, z_L)
    output = lm_head(z_H)
    return output

NeuroGolf Constraints

Constraint	Value
Input/Output	float32 [1,10,30,30] one-hot
Max file size	1.44 MB per ONNX
Banned ops	Loop, Scan, NonZero, Unique, Script, Function
Scoring	max(1.0, 25.0 - ln(MACs + memory + params))

Full TRM Cannot Fit

7M params = ~28MB ONNX. Limit is 1.44MB. 20x over.

Loops are BANNED in NeuroGolf, so ACT and recursion must be unrolled. Unrolling 15 passes of 2-layer transformer = 30 effective layers.

Tiny TRM Configs That Fit

Config	Params	ONNX Size	Recursions	Est Score/Task
hidden=64, 4 heads, 2 layers, 4 recursions	~42K	~170KB	4	~12.5
hidden=64, 4 heads, 2 layers, 8 recursions	~85K	~340KB	8	~11.6
hidden=128, 4 heads, 2 layers, 4 recursions	~170K	~680KB	4	~11.0
hidden=128, 4 heads, 2 layers, 8 recursions	~340K	~1.4MB	8	barely fits

LLM Agent Integration

The LLM (DeepSeek) is used OFFLINE during model generation. It does NOT go into the ONNX file. It classifies tasks and routes to the correct solver. Zero cost impact on the submitted ONNX models.

Architecture: ARC task -> DeepSeek API (classify) -> route to solver -> build ONNX -> validate -> submit

Official Code

Repo: wtfmahe/Samsung-TRM on HuggingFace
GitHub: Kilo-Org/kilocode (for Kilo CLI)
Key files: models/recursive_reasoning/trm.py, dataset/build_arc_dataset.py, config/arch/trm.yaml
Dataset builder handles: ARC-AGI + ConceptARC, 1000 augmentations, color/dihedral/translation

Next Steps

Build tiny TRM (hidden=64) - implement from official code, adapt encoding
Train on ARC data (single A10G, ~few hours)
Evaluate on unsolved tasks (the 348 that analytical solvers can't handle)
Export to ONNX within NeuroGolf constraints
Integrate with LLM classifier for routing