FAAST-Qwen2.5-3B-Instruct

faast-Qwen2.5-3B-Instruct is an extension of Qwen2.5-3B-Instruct equipped with the FAAST module. The original Qwen2.5-3B-Instruct parameters are frozen, while only the FAAST readout projections are trained.

The model is designed for efficient test-time learning through fast weights, enabling adaptation without backpropagation (gradient descent).

Model Description

FAAST augments Qwen2.5-3B-Instruct with fast-weight adaptation modules that support supervised learning during inference. During FAAST pretraining, all backbone LLM parameters remain frozen, and only lightweight FAAST readout projections are optimized.

This design enables:

Test-time learning without backpropagation
Efficient adaptation with low memory overhead
Fast adaptation to downstream tasks
Improved few-shot and full-data performance

Training Details

Base model: Qwen2.5-3B-Instruct
Trainable parameters: FAAST readout projections
Frozen parameters: All Qwen2.5-3B-Instruct parameters
Pretraining corpus: OpenWebText2
Adaptation mechanism: Fast weights / FAAST readout projections

Evaluation Results

Machine Translation on IWSLT2017

BLEU scores on IWSLT2017. Bold scores indicate statistical significance at p < 0.05.

Qwen2.5-3B-Instruct Backbone

Method	En-De 1-shot	En-De full	De-En 1-shot	De-En full	En-Fr 1-shot	En-Fr full	Fr-En 1-shot	Fr-En full
Qwen2.5-3B-Instruct (zero-shot)	-	23.22	-	32.92	-	30.56	-	39.24
In-Context Learning	23.03	-	32.33	-	31.85	-	38.51	-
FAAST (Ours)	23.35	25.22	33.23	36.40	31.12	35.09	39.46	42.47

Qwen2.5-7B-Instruct Backbone

Method	En-De 1-shot	En-De full	De-En 1-shot	De-En full	En-Fr 1-shot	En-Fr full	Fr-En 1-shot	Fr-En full
Qwen2.5-7B-Instruct (zero-shot)	-	25.53	-	34.69	-	34.82	-	41.40
In-Context Learning	25.39	-	35.70	-	35.45	-	40.86	-
FAAST (Ours)	26.77	27.75	35.34	37.10	35.67	37.08	42.08	43.93

Key Features

Frozen backbone LLM parameters
Lightweight FAAST readout adaptation
Test-time learning capability
Efficient memory usage
Strong few-shot translation performance
Compatible with instruction-tuned LLMs

Limitations

The model inherits the limitations and biases of Qwen2.5-3B-Instruct.
Performance may vary across domains and languages not covered during evaluation.
FAAST adaptation quality depends on the distribution and quality of test-time examples.
The model is primarily intended for research purposes.

Citation

If you use this model, please cite the corresponding FAAST paper or project.

@article{bao2026faast,
  title={FAAST: Forward-Only Associative Learning via Closed-Form Fast Weights for Test-Time Supervised Adaptation},
  author={Bao, Guangsheng and Zhang, Hongbo and Cui, Han and Sun, Ke and Zhao, Yanbin and He, Juncai and Zhang, Yue},
  journal={arXiv preprint arXiv:2605.04651},
  year={2026}
}