FAAST-Qwen2.5-7B-Instruct
faast-Qwen2.5-7B-Instruct is an extension of Qwen2.5-7B-Instruct equipped with the FAAST module. The original Qwen2.5-7B-Instruct parameters are frozen, while only the FAAST readout projections are trained.
The model is designed for efficient test-time learning through fast weights, enabling adaptation without backpropagation (gradient descent).
Model Description
FAAST augments Qwen2.5-7B-Instruct with fast-weight adaptation modules that support supervised learning during inference. During FAAST pretraining, all backbone LLM parameters remain frozen, and only lightweight FAAST readout projections are optimized.
This design enables:
- Test-time learning without backpropagation
- Efficient adaptation with low memory overhead
- Fast adaptation to downstream tasks
- Improved few-shot and full-data performance
Training Details
- Base model: Qwen2.5-7B-Instruct
- Trainable parameters: FAAST readout projections
- Frozen parameters: All Qwen2.5-7B-Instruct parameters
- Pretraining corpus: OpenWebText2
- Adaptation mechanism: Fast weights / FAAST readout projections
Evaluation Results
Machine Translation on IWSLT2017
BLEU scores on IWSLT2017. Bold scores indicate statistical significance at p < 0.05.
Qwen2.5-3B-Instruct Backbone
| Method | En-De 1-shot | En-De full | De-En 1-shot | De-En full | En-Fr 1-shot | En-Fr full | Fr-En 1-shot | Fr-En full |
|---|---|---|---|---|---|---|---|---|
| Qwen2.5-3B-Instruct (zero-shot) | - | 23.22 | - | 32.92 | - | 30.56 | - | 39.24 |
| In-Context Learning | 23.03 | - | 32.33 | - | 31.85 | - | 38.51 | - |
| FAAST (Ours) | 23.35 | 25.22 | 33.23 | 36.40 | 31.12 | 35.09 | 39.46 | 42.47 |
Qwen2.5-7B-Instruct Backbone
| Method | En-De 1-shot | En-De full | De-En 1-shot | De-En full | En-Fr 1-shot | En-Fr full | Fr-En 1-shot | Fr-En full |
|---|---|---|---|---|---|---|---|---|
| Qwen2.5-7B-Instruct (zero-shot) | - | 25.53 | - | 34.69 | - | 34.82 | - | 41.40 |
| In-Context Learning | 25.39 | - | 35.70 | - | 35.45 | - | 40.86 | - |
| FAAST (Ours) | 26.77 | 27.75 | 35.34 | 37.10 | 35.67 | 37.08 | 42.08 | 43.93 |
Key Features
- Frozen backbone LLM parameters
- Lightweight FAAST readout adaptation
- Test-time learning capability
- Efficient memory usage
- Strong few-shot translation performance
- Compatible with instruction-tuned LLMs
Limitations
- The model inherits the limitations and biases of Qwen2.5-7B-Instruct.
- Performance may vary across domains and languages not covered during evaluation.
- FAAST adaptation quality depends on the distribution and quality of test-time examples.
- The model is primarily intended for research purposes.
Citation
If you use this model, please cite the corresponding FAAST paper or project.
@article{bao2026faast,
title={FAAST: Forward-Only Associative Learning via Closed-Form Fast Weights for Test-Time Supervised Adaptation},
author={Bao, Guangsheng and Zhang, Hongbo and Cui, Han and Sun, Ke and Zhao, Yanbin and He, Juncai and Zhang, Yue},
journal={arXiv preprint arXiv:2605.04651},
year={2026}
}
- Downloads last month
- 38