Pretrained LLMs that support test-time supervised adaptation by compressing few-shot examples into fast weights, eliminating gradient descent.