gshbao's picture
Update README.md
ef3632a verified
metadata
license: apache-2.0
base_model: Qwen/Qwen2.5-7B-Instruct
tags:
  - qwen2.5
  - test-time-learning
  - fast-weights
  - adaptation
  - multilingual
datasets:
  - OpenWebText2
  - IWSLT2017
language:
  - en
  - de
  - fr

FAAST-Qwen2.5-7B-Instruct

faast-Qwen2.5-7B-Instruct is an extension of Qwen2.5-7B-Instruct equipped with the FAAST module. The original Qwen2.5-7B-Instruct parameters are frozen, while only the FAAST readout projections are trained.

The model is designed for efficient test-time learning through fast weights, enabling adaptation without backpropagation (gradient descent).

Model Description

FAAST augments Qwen2.5-7B-Instruct with fast-weight adaptation modules that support supervised learning during inference. During FAAST pretraining, all backbone LLM parameters remain frozen, and only lightweight FAAST readout projections are optimized.

This design enables:

  • Test-time learning without backpropagation
  • Efficient adaptation with low memory overhead
  • Fast adaptation to downstream tasks
  • Improved few-shot and full-data performance

Usage:

Before running the following code, we need to import the modules from https://github.com/baoguangsheng/faast

tokenizer = AutoTokenizer.from_pretrained(args.model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(args.model_path, trust_remote_code=True)

fewshot_samples = ['sample 1', 'sample 2', ...]
inputs = tokenizer(fewshot_samples, return_tensors="pt", padding=True)

model.reset_projection() # clear existing fast weights
model.learn(**inputs)  # learn new fast weights
model.generate(...)  # do the task using the learned fast weights

Training Details

  • Base model: Qwen2.5-7B-Instruct
  • Trainable parameters: FAAST readout projections
  • Frozen parameters: All Qwen2.5-7B-Instruct parameters
  • Pretraining corpus: OpenWebText2
  • Adaptation mechanism: Fast weights / FAAST readout projections

Evaluation Results

Machine Translation on IWSLT2017

BLEU scores on IWSLT2017. Bold scores indicate statistical significance at p < 0.05.

Qwen2.5-3B-Instruct Backbone

Method En-De 1-shot En-De full De-En 1-shot De-En full En-Fr 1-shot En-Fr full Fr-En 1-shot Fr-En full
Qwen2.5-3B-Instruct (zero-shot) - 23.22 - 32.92 - 30.56 - 39.24
In-Context Learning 23.03 - 32.33 - 31.85 - 38.51 -
FAAST (Ours) 23.35 25.22 33.23 36.40 31.12 35.09 39.46 42.47

Qwen2.5-7B-Instruct Backbone

Method En-De 1-shot En-De full De-En 1-shot De-En full En-Fr 1-shot En-Fr full Fr-En 1-shot Fr-En full
Qwen2.5-7B-Instruct (zero-shot) - 25.53 - 34.69 - 34.82 - 41.40
In-Context Learning 25.39 - 35.70 - 35.45 - 40.86 -
FAAST (Ours) 26.77 27.75 35.34 37.10 35.67 37.08 42.08 43.93

Key Features

  • Frozen backbone LLM parameters
  • Lightweight FAAST readout adaptation
  • Test-time learning capability
  • Efficient memory usage
  • Strong few-shot translation performance
  • Compatible with instruction-tuned LLMs

Limitations

  • The model inherits the limitations and biases of Qwen2.5-7B-Instruct.
  • Performance may vary across domains and languages not covered during evaluation.
  • FAAST adaptation quality depends on the distribution and quality of test-time examples.
  • The model is primarily intended for research purposes.

Citation

If you use this model, please cite the corresponding FAAST paper or project.

@article{bao2026faast,
  title={FAAST: Forward-Only Associative Learning via Closed-Form Fast Weights for Test-Time Supervised Adaptation},
  author={Bao, Guangsheng and Zhang, Hongbo and Cui, Han and Sun, Ke and Zhao, Yanbin and He, Juncai and Zhang, Yue},
  journal={arXiv preprint arXiv:2605.04651},
  year={2026}
}