FAAST-Qwen2.5-3B-Instruct

faast-Qwen2.5-3B-Instruct is an extension of Qwen2.5-3B-Instruct equipped with the FAAST module. The original Qwen2.5-3B-Instruct parameters are frozen, while only the FAAST readout projections are trained.

The model is designed for efficient test-time learning through fast weights, enabling adaptation without backpropagation (gradient descent).

Model Description

FAAST augments Qwen2.5-3B-Instruct with fast-weight adaptation modules that support supervised learning during inference. During FAAST pretraining, all backbone LLM parameters remain frozen, and only lightweight FAAST readout projections are optimized.

This design enables:

  • Test-time learning without backpropagation
  • Efficient adaptation with low memory overhead
  • Fast adaptation to downstream tasks
  • Improved few-shot and full-data performance

Training Details

  • Base model: Qwen2.5-3B-Instruct
  • Trainable parameters: FAAST readout projections
  • Frozen parameters: All Qwen2.5-3B-Instruct parameters
  • Pretraining corpus: OpenWebText2
  • Adaptation mechanism: Fast weights / FAAST readout projections

Evaluation Results

Machine Translation on IWSLT2017

BLEU scores on IWSLT2017. Bold scores indicate statistical significance at p < 0.05.

Qwen2.5-3B-Instruct Backbone

Method En-De 1-shot En-De full De-En 1-shot De-En full En-Fr 1-shot En-Fr full Fr-En 1-shot Fr-En full
Qwen2.5-3B-Instruct (zero-shot) - 23.22 - 32.92 - 30.56 - 39.24
In-Context Learning 23.03 - 32.33 - 31.85 - 38.51 -
FAAST (Ours) 23.35 25.22 33.23 36.40 31.12 35.09 39.46 42.47

Qwen2.5-7B-Instruct Backbone

Method En-De 1-shot En-De full De-En 1-shot De-En full En-Fr 1-shot En-Fr full Fr-En 1-shot Fr-En full
Qwen2.5-7B-Instruct (zero-shot) - 25.53 - 34.69 - 34.82 - 41.40
In-Context Learning 25.39 - 35.70 - 35.45 - 40.86 -
FAAST (Ours) 26.77 27.75 35.34 37.10 35.67 37.08 42.08 43.93

Key Features

  • Frozen backbone LLM parameters
  • Lightweight FAAST readout adaptation
  • Test-time learning capability
  • Efficient memory usage
  • Strong few-shot translation performance
  • Compatible with instruction-tuned LLMs

Limitations

  • The model inherits the limitations and biases of Qwen2.5-3B-Instruct.
  • Performance may vary across domains and languages not covered during evaluation.
  • FAAST adaptation quality depends on the distribution and quality of test-time examples.
  • The model is primarily intended for research purposes.

Citation

If you use this model, please cite the corresponding FAAST paper or project.

@article{bao2026faast,
  title={FAAST: Forward-Only Associative Learning via Closed-Form Fast Weights for Test-Time Supervised Adaptation},
  author={Bao, Guangsheng and Zhang, Hongbo and Cui, Han and Sun, Ke and Zhao, Yanbin and He, Juncai and Zhang, Yue},
  journal={arXiv preprint arXiv:2605.04651},
  year={2026}
}
Downloads last month
34
Safetensors
Model size
3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for gshbao/faast-Qwen2.5-3B-Instruct

Base model

Qwen/Qwen2.5-3B
Finetuned
(1269)
this model

Collection including gshbao/faast-Qwen2.5-3B-Instruct

Paper for gshbao/faast-Qwen2.5-3B-Instruct