Loom: A Scalable Computer Architecture for Looped Transformers

Paper | GitHub | Live Demos

Loom is a general-purpose computer implemented as a looped transformer with analytically derived weights. Programs are written in C, compiled to a 21-opcode ISA, and executed as iterated matrix multiplications through 8 fixed-weight transformer layers.

Model Description

Loom is not a trained model. Every weight is derived analytically from the ISA specification. The transformer implements a programmable computer: each forward pass executes one compiled instruction, and the model is applied in a loop until the program halts.

The architecture supports 21 opcodes (arithmetic, logic, shifts, comparisons, branches, indirect memory access, conditional moves, and multiply-accumulate) in 8 transformer layers, down from the 10 layers required for the single-instruction baseline of Giannou et al.

Key Design Choices

Argmax attention replaces softmax for numerically exact execution over arbitrary step counts.
Opcode-as-operand-routing maps all 21 operations to operand preparation for a shared subtract core, requiring only one arithmetic layer.
6-threshold direct subtraction computes a-b in one layer (replacing the classical 3-layer approach).
STORE opcode enables indirect memory writes, reducing the Sudoku solver from 1,085 to 284 instructions.

Configurations

Config	d_model	n	Memory	Instructions	ONNX Size
Compact	146	512	160 slots	320 slots	8.3 MB
Standard	155	1,024	64 slots	928 slots	14.8 MB
Large	164	2,048	224 slots	1,792 slots	28.0 MB

ONNX Models

The repository includes pre-exported ONNX models with argmax attention expressed as GPU-native operations (ReduceMax, comparison, division). No TopK or OneHot operators. Full WebGPU acceleration.

argmax_146x512.onnx (8.3 MB): compact config
argmax_155x1024.onnx (14.8 MB): standard config
argmax_164x2048.onnx (28.0 MB): large config

Input/Output

Input: state tensor of shape [d_model, n], dtype float32
Output: new_state tensor of shape [d_model, n], dtype float32

Each call executes one ISA instruction. Loop until PC (program counter) reaches 0.

Usage

Quick Start (Python)

from loom_v1 import LoomConfig, LoomComputer, init_state, read_memory, get_pc, OP_INC, OP_HALT
from subleq import signed_from_bipolar
import torch

cfg = LoomConfig(s=32, m=8, n=64, N=8)
comp = LoomComputer(cfg)
X = init_state(cfg, [5,0,0,0,0,0,0,0], [(OP_INC, cfg.s, 0), (OP_HALT, 0, 0)])

with torch.no_grad():
    while get_pc(X, cfg) != 0:
        X = comp.step(X)

print('mem[0] =', read_memory(X, cfg)[0])  # 6

Full C Compilation

from loom_v1 import LoomConfig, LoomComputer, init_state, read_memory, get_pc
from c_compiler import compile_c
import torch

source = """
int main() {
    int a; int b; int t; int i;
    a = 0; b = 1; i = 0;
    while (i < 10) { t = a + b; a = b; b = t; i = i + 1; }
    return a;
}
"""

cfg, mem, cmds, meta = compile_c(source, s=32, m=160, n=512, N=8)
comp = LoomComputer(cfg)
X = init_state(cfg, mem, cmds)

with torch.no_grad():
    while get_pc(X, cfg) != 0:
        X = comp.step(X)

from subleq import signed_from_bipolar
a = signed_from_bipolar(X[cfg.idx_memory:cfg.idx_memory+cfg.N, cfg.s + meta['variables']['a']])
print(f"fib(10) = {a}")  # 55

Validation

42 opcode unit tests (all pass)
19 SWAP integration tests (all pass)
50 compiled C program tests including Fibonacci, GCD, sorting, LOAD/STORE round-trips (all pass)
FPGA hardware verification on Xilinx Alveo U200 (INC test pass)

Browser Demos

Interactive demos run entirely client-side via ONNX Runtime Web:

Sorting with real-time architecture visualization (3D layer activations)
C debugger with source highlighting and variable watch
Playable Snake game (84 transformer steps per tick)
9x9 Sudoku solver
DOOM raycasting

Limitations

8-bit signed integers (-128 to 127) by default
One instruction per forward pass (no pipelining)
No multiplication/division in hardware (software emulation via MULACC)
FIND requires unique values in the search array
LOAD/STORE require valid in-range pointers

Citation

@misc{turkcan2026loomscalableanalyticalneural,
      title={Loom: A Scalable Analytical Neural Computer Architecture}, 
      author={Mehmet Kerem Turkcan},
      year={2026},
      eprint={2604.08816},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2604.08816}, 
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Other

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for mehmetkeremturkcan/Loom

Loom: A Scalable Analytical Neural Computer Architecture

Paper • 2604.08816 • Published 11 days ago