rogermt
/

ARC-AGI

Model card Files Files and versions

ARC-AGI / trm_solver /TRM_RESEARCH.md

rogermt's picture

Add TRM research findings and architecture notes

0e63a4d verified 5 days ago

|

history blame contribute delete

3.5 kB

	# TRM Research Notes (from arxiv:2510.04871 + official code wtfmahe/Samsung-TRM)

	## What TRM Is

	Tiny Recursive Model = 2-layer transformer that recurses on itself.
	Single network does both reasoning (z_L) and answer (z_H) updates.

	Full TRM specs:
	- hidden=512, 8 heads, SwiGLU expansion=4
	- 2 layers only, recursed T=3 outer x n=4 inner = ~15 passes
	- 7M params total
	- ACT (Adaptive Compute Time) for dynamic halting
	- EMA (Exponential Moving Average) weight updates, 0.999
	- RoPE position encoding
	- Puzzle embedding: 16 learned tokens per task

	Training:
	- 100K epochs, lr=1e-4, batch=768
	- 1000 augmentations per task (color permute + dihedral + translate)
	- Data: ARC-AGI training + evaluation + ConceptARC
	- 3 days on 4xH100
	- Loss: stablemax cross-entropy

	Results:
	- 45% ARC-AGI-1 (beats DeepSeek R1 at 15.8%, o3-mini at 34.5%)
	- 8% ARC-AGI-2
	- 7M params vs 671B for DeepSeek R1

	## Encoding

	ARC grids encoded as flat sequences:
	- vocab_size = 12: 0=PAD, 1=EOS, 2-11=colors 0-9
	- Grid flattened to 900 tokens (30x30)
	- EOS marks grid boundary
	- Translational augmentation: random padding offsets

	## Algorithm (simplified)

	```python
	def trm_forward(x, z_L, z_H, net, n=4, T=3):
	x = embed(x) + puzzle_emb
	# T-1 outer cycles without grad (improve initialization)
	for _ in range(T - 1):
	for _ in range(n):
	z_L = net(z_L, z_H + x) # update reasoning
	z_H = net(z_H, z_L) # update answer
	# 1 outer cycle with grad
	for _ in range(n):
	z_L = net(z_L, z_H + x)
	z_H = net(z_H, z_L)
	output = lm_head(z_H)
	return output
	```

	## NeuroGolf Constraints

	\| Constraint \| Value \|
	\|-----------\|-------\|
	\| Input/Output \| float32 [1,10,30,30] one-hot \|
	\| Max file size \| 1.44 MB per ONNX \|
	\| Banned ops \| Loop, Scan, NonZero, Unique, Script, Function \|
	\| Scoring \| max(1.0, 25.0 - ln(MACs + memory + params)) \|

	## Full TRM Cannot Fit

	7M params = ~28MB ONNX. Limit is 1.44MB. 20x over.

	Loops are BANNED in NeuroGolf, so ACT and recursion must be unrolled.
	Unrolling 15 passes of 2-layer transformer = 30 effective layers.

	## Tiny TRM Configs That Fit

	\| Config \| Params \| ONNX Size \| Recursions \| Est Score/Task \|
	\|--------\|--------\|-----------\|------------\|----------------\|
	\| hidden=64, 4 heads, 2 layers, 4 recursions \| ~42K \| ~170KB \| 4 \| ~12.5 \|
	\| hidden=64, 4 heads, 2 layers, 8 recursions \| ~85K \| ~340KB \| 8 \| ~11.6 \|
	\| hidden=128, 4 heads, 2 layers, 4 recursions \| ~170K \| ~680KB \| 4 \| ~11.0 \|
	\| hidden=128, 4 heads, 2 layers, 8 recursions \| ~340K \| ~1.4MB \| 8 \| barely fits \|

	## LLM Agent Integration

	The LLM (DeepSeek) is used OFFLINE during model generation.
	It does NOT go into the ONNX file.
	It classifies tasks and routes to the correct solver.
	Zero cost impact on the submitted ONNX models.

	Architecture:
	ARC task -> DeepSeek API (classify) -> route to solver -> build ONNX -> validate -> submit

	## Official Code

	- Repo: wtfmahe/Samsung-TRM on HuggingFace
	- GitHub: Kilo-Org/kilocode (for Kilo CLI)
	- Key files: models/recursive_reasoning/trm.py, dataset/build_arc_dataset.py, config/arch/trm.yaml
	- Dataset builder handles: ARC-AGI + ConceptARC, 1000 augmentations, color/dihedral/translation

	## Next Steps

	1. Build tiny TRM (hidden=64) - implement from official code, adapt encoding
	2. Train on ARC data (single A10G, ~few hours)
	3. Evaluate on unsolved tasks (the 348 that analytical solvers can't handle)
	4. Export to ONNX within NeuroGolf constraints
	5. Integrate with LLM classifier for routing