Update README.md

ef3632a verified 5 days ago

4.08 kB

license: apache-2.0
base_model: Qwen/Qwen2.5-7B-Instruct
tags:
  - qwen2.5
  - test-time-learning
  - fast-weights
  - adaptation
  - multilingual
datasets:
  - OpenWebText2
  - IWSLT2017
language:
  - en
  - de
  - fr

FAAST-Qwen2.5-7B-Instruct

faast-Qwen2.5-7B-Instruct is an extension of Qwen2.5-7B-Instruct equipped with the FAAST module. The original Qwen2.5-7B-Instruct parameters are frozen, while only the FAAST readout projections are trained.

The model is designed for efficient test-time learning through fast weights, enabling adaptation without backpropagation (gradient descent).

Model Description

FAAST augments Qwen2.5-7B-Instruct with fast-weight adaptation modules that support supervised learning during inference. During FAAST pretraining, all backbone LLM parameters remain frozen, and only lightweight FAAST readout projections are optimized.

This design enables:

Test-time learning without backpropagation
Efficient adaptation with low memory overhead
Fast adaptation to downstream tasks
Improved few-shot and full-data performance

Usage:

Before running the following code, we need to import the modules from https://github.com/baoguangsheng/faast

tokenizer = AutoTokenizer.from_pretrained(args.model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(args.model_path, trust_remote_code=True)

fewshot_samples = ['sample 1', 'sample 2', ...]
inputs = tokenizer(fewshot_samples, return_tensors="pt", padding=True)

model.reset_projection() # clear existing fast weights
model.learn(**inputs)  # learn new fast weights
model.generate(...)  # do the task using the learned fast weights

Training Details

Base model: Qwen2.5-7B-Instruct
Trainable parameters: FAAST readout projections
Frozen parameters: All Qwen2.5-7B-Instruct parameters
Pretraining corpus: OpenWebText2
Adaptation mechanism: Fast weights / FAAST readout projections

Evaluation Results

Machine Translation on IWSLT2017

BLEU scores on IWSLT2017. Bold scores indicate statistical significance at p < 0.05.

Qwen2.5-3B-Instruct Backbone

Method	En-De 1-shot	En-De full	De-En 1-shot	De-En full	En-Fr 1-shot	En-Fr full	Fr-En 1-shot	Fr-En full
Qwen2.5-3B-Instruct (zero-shot)	-	23.22	-	32.92	-	30.56	-	39.24
In-Context Learning	23.03	-	32.33	-	31.85	-	38.51	-
FAAST (Ours)	23.35	25.22	33.23	36.40	31.12	35.09	39.46	42.47

Qwen2.5-7B-Instruct Backbone

Method	En-De 1-shot	En-De full	De-En 1-shot	De-En full	En-Fr 1-shot	En-Fr full	Fr-En 1-shot	Fr-En full
Qwen2.5-7B-Instruct (zero-shot)	-	25.53	-	34.69	-	34.82	-	41.40
In-Context Learning	25.39	-	35.70	-	35.45	-	40.86	-
FAAST (Ours)	26.77	27.75	35.34	37.10	35.67	37.08	42.08	43.93

Key Features

Frozen backbone LLM parameters
Lightweight FAAST readout adaptation
Test-time learning capability
Efficient memory usage
Strong few-shot translation performance
Compatible with instruction-tuned LLMs

Limitations

The model inherits the limitations and biases of Qwen2.5-7B-Instruct.
Performance may vary across domains and languages not covered during evaluation.
FAAST adaptation quality depends on the distribution and quality of test-time examples.
The model is primarily intended for research purposes.

Citation

If you use this model, please cite the corresponding FAAST paper or project.

@article{bao2026faast,
  title={FAAST: Forward-Only Associative Learning via Closed-Form Fast Weights for Test-Time Supervised Adaptation},
  author={Bao, Guangsheng and Zhang, Hongbo and Cui, Han and Sun, Ke and Zhao, Yanbin and He, Juncai and Zhang, Yue},
  journal={arXiv preprint arXiv:2605.04651},
  year={2026}
}