Loom: A Scalable Computer Architecture for Looped Transformers

Paper | GitHub | Live Demos

Loom is a general-purpose computer implemented as a looped transformer with analytically derived weights. Programs are written in C, compiled to a 21-opcode ISA, and executed as iterated matrix multiplications through 8 fixed-weight transformer layers.

Model Description

Loom is not a trained model. Every weight is derived analytically from the ISA specification. The transformer implements a programmable computer: each forward pass executes one compiled instruction, and the model is applied in a loop until the program halts.

The architecture supports 21 opcodes (arithmetic, logic, shifts, comparisons, branches, indirect memory access, conditional moves, and multiply-accumulate) in 8 transformer layers, down from the 10 layers required for the single-instruction baseline of Giannou et al.

Key Design Choices

  • Argmax attention replaces softmax for numerically exact execution over arbitrary step counts.
  • Opcode-as-operand-routing maps all 21 operations to operand preparation for a shared subtract core, requiring only one arithmetic layer.
  • 6-threshold direct subtraction computes a-b in one layer (replacing the classical 3-layer approach).
  • STORE opcode enables indirect memory writes, reducing the Sudoku solver from 1,085 to 284 instructions.

Configurations

Config d_model n Memory Instructions ONNX Size
Compact 146 512 160 slots 320 slots 8.3 MB
Standard 155 1,024 64 slots 928 slots 14.8 MB
Large 164 2,048 224 slots 1,792 slots 28.0 MB

ONNX Models

The repository includes pre-exported ONNX models with argmax attention expressed as GPU-native operations (ReduceMax, comparison, division). No TopK or OneHot operators. Full WebGPU acceleration.

  • argmax_146x512.onnx (8.3 MB): compact config
  • argmax_155x1024.onnx (14.8 MB): standard config
  • argmax_164x2048.onnx (28.0 MB): large config

Input/Output

  • Input: state tensor of shape [d_model, n], dtype float32
  • Output: new_state tensor of shape [d_model, n], dtype float32

Each call executes one ISA instruction. Loop until PC (program counter) reaches 0.

Usage

Quick Start (Python)

from loom_v1 import LoomConfig, LoomComputer, init_state, read_memory, get_pc, OP_INC, OP_HALT
from subleq import signed_from_bipolar
import torch

cfg = LoomConfig(s=32, m=8, n=64, N=8)
comp = LoomComputer(cfg)
X = init_state(cfg, [5,0,0,0,0,0,0,0], [(OP_INC, cfg.s, 0), (OP_HALT, 0, 0)])

with torch.no_grad():
    while get_pc(X, cfg) != 0:
        X = comp.step(X)

print('mem[0] =', read_memory(X, cfg)[0])  # 6

Full C Compilation

from loom_v1 import LoomConfig, LoomComputer, init_state, read_memory, get_pc
from c_compiler import compile_c
import torch

source = """
int main() {
    int a; int b; int t; int i;
    a = 0; b = 1; i = 0;
    while (i < 10) { t = a + b; a = b; b = t; i = i + 1; }
    return a;
}
"""

cfg, mem, cmds, meta = compile_c(source, s=32, m=160, n=512, N=8)
comp = LoomComputer(cfg)
X = init_state(cfg, mem, cmds)

with torch.no_grad():
    while get_pc(X, cfg) != 0:
        X = comp.step(X)

from subleq import signed_from_bipolar
a = signed_from_bipolar(X[cfg.idx_memory:cfg.idx_memory+cfg.N, cfg.s + meta['variables']['a']])
print(f"fib(10) = {a}")  # 55

Validation

  • 42 opcode unit tests (all pass)
  • 19 SWAP integration tests (all pass)
  • 50 compiled C program tests including Fibonacci, GCD, sorting, LOAD/STORE round-trips (all pass)
  • FPGA hardware verification on Xilinx Alveo U200 (INC test pass)

Browser Demos

Interactive demos run entirely client-side via ONNX Runtime Web:

  • Sorting with real-time architecture visualization (3D layer activations)
  • C debugger with source highlighting and variable watch
  • Playable Snake game (84 transformer steps per tick)
  • 9x9 Sudoku solver
  • DOOM raycasting

Limitations

  • 8-bit signed integers (-128 to 127) by default
  • One instruction per forward pass (no pipelining)
  • No multiplication/division in hardware (software emulation via MULACC)
  • FIND requires unique values in the search array
  • LOAD/STORE require valid in-range pointers

Citation

@misc{turkcan2026loomscalableanalyticalneural,
      title={Loom: A Scalable Analytical Neural Computer Architecture}, 
      author={Mehmet Kerem Turkcan},
      year={2026},
      eprint={2604.08816},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2604.08816}, 
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for mehmetkeremturkcan/Loom