arxiv:2603.06642

SR-TTT: Surprisal-Aware Residual Test-Time Training

Published on Feb 26

Authors:

Abstract

SR-TTT addresses recall failures in test-time training language models by combining fast weights with a sparse memory mechanism that selectively uses exact attention for surprising tokens while maintaining constant memory usage.

AI-generated summary

Test-Time Training (TTT) language models achieve theoretically infinite context windows with an O(1) memory footprint by replacing the standard exact-attention KV-cache with hidden state ``fast weights'' W_fast updated via self-supervised learning during inference. However, pure TTT architectures suffer catastrophic failures on exact-recall tasks (e.g., Needle-in-a-Haystack). Because the fast weights aggressively compress the context into an information bottleneck, highly surprising or unique tokens are rapidly overwritten and forgotten by subsequent token gradient updates. We introduce SR-TTT (Surprisal-Aware Residual Test-Time Training), which resolves this recall failure by augmenting the TTT backbone with a loss-gated sparse memory mechanism. By dynamically routing only incompressible, highly surprising tokens to a traditional exact-attention Residual Cache, SR-TTT preserves O(1) memory for low-entropy background context while utilizing exact attention exclusively for critical needles. Our complete implementation, training scripts, and pre-trained weights are open-source and available at: https://github.com/swamynathanvp/Surprisal-Aware-Residual-Test-Time-Training.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2603.06642

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2603.06642 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2603.06642 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.06642 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.