ARC-AGI / trm_solver /TRM_RESEARCH.md
rogermt's picture
Add TRM research findings and architecture notes
0e63a4d verified

TRM Research Notes (from arxiv:2510.04871 + official code wtfmahe/Samsung-TRM)

What TRM Is

Tiny Recursive Model = 2-layer transformer that recurses on itself. Single network does both reasoning (z_L) and answer (z_H) updates.

Full TRM specs:

  • hidden=512, 8 heads, SwiGLU expansion=4
  • 2 layers only, recursed T=3 outer x n=4 inner = ~15 passes
  • 7M params total
  • ACT (Adaptive Compute Time) for dynamic halting
  • EMA (Exponential Moving Average) weight updates, 0.999
  • RoPE position encoding
  • Puzzle embedding: 16 learned tokens per task

Training:

  • 100K epochs, lr=1e-4, batch=768
  • 1000 augmentations per task (color permute + dihedral + translate)
  • Data: ARC-AGI training + evaluation + ConceptARC
  • 3 days on 4xH100
  • Loss: stablemax cross-entropy

Results:

  • 45% ARC-AGI-1 (beats DeepSeek R1 at 15.8%, o3-mini at 34.5%)
  • 8% ARC-AGI-2
  • 7M params vs 671B for DeepSeek R1

Encoding

ARC grids encoded as flat sequences:

  • vocab_size = 12: 0=PAD, 1=EOS, 2-11=colors 0-9
  • Grid flattened to 900 tokens (30x30)
  • EOS marks grid boundary
  • Translational augmentation: random padding offsets

Algorithm (simplified)

def trm_forward(x, z_L, z_H, net, n=4, T=3):
    x = embed(x) + puzzle_emb
    # T-1 outer cycles without grad (improve initialization)
    for _ in range(T - 1):
        for _ in range(n):
            z_L = net(z_L, z_H + x)  # update reasoning
        z_H = net(z_H, z_L)          # update answer
    # 1 outer cycle with grad
    for _ in range(n):
        z_L = net(z_L, z_H + x)
    z_H = net(z_H, z_L)
    output = lm_head(z_H)
    return output

NeuroGolf Constraints

Constraint Value
Input/Output float32 [1,10,30,30] one-hot
Max file size 1.44 MB per ONNX
Banned ops Loop, Scan, NonZero, Unique, Script, Function
Scoring max(1.0, 25.0 - ln(MACs + memory + params))

Full TRM Cannot Fit

7M params = ~28MB ONNX. Limit is 1.44MB. 20x over.

Loops are BANNED in NeuroGolf, so ACT and recursion must be unrolled. Unrolling 15 passes of 2-layer transformer = 30 effective layers.

Tiny TRM Configs That Fit

Config Params ONNX Size Recursions Est Score/Task
hidden=64, 4 heads, 2 layers, 4 recursions ~42K ~170KB 4 ~12.5
hidden=64, 4 heads, 2 layers, 8 recursions ~85K ~340KB 8 ~11.6
hidden=128, 4 heads, 2 layers, 4 recursions ~170K ~680KB 4 ~11.0
hidden=128, 4 heads, 2 layers, 8 recursions ~340K ~1.4MB 8 barely fits

LLM Agent Integration

The LLM (DeepSeek) is used OFFLINE during model generation. It does NOT go into the ONNX file. It classifies tasks and routes to the correct solver. Zero cost impact on the submitted ONNX models.

Architecture: ARC task -> DeepSeek API (classify) -> route to solver -> build ONNX -> validate -> submit

Official Code

  • Repo: wtfmahe/Samsung-TRM on HuggingFace
  • GitHub: Kilo-Org/kilocode (for Kilo CLI)
  • Key files: models/recursive_reasoning/trm.py, dataset/build_arc_dataset.py, config/arch/trm.yaml
  • Dataset builder handles: ARC-AGI + ConceptARC, 1000 augmentations, color/dihedral/translation

Next Steps

  1. Build tiny TRM (hidden=64) - implement from official code, adapt encoding
  2. Train on ARC data (single A10G, ~few hours)
  3. Evaluate on unsolved tasks (the 348 that analytical solvers can't handle)
  4. Export to ONNX within NeuroGolf constraints
  5. Integrate with LLM classifier for routing