Update README.md

0a2b6cc verified 5 days ago

4.08 kB

	---
	license: apache-2.0
	base_model: Qwen/Qwen2.5-3B-Instruct
	tags:
	- qwen2.5
	- test-time-learning
	- fast-weights
	- adaptation
	- multilingual
	datasets:
	- OpenWebText2
	- IWSLT2017
	language:
	- en
	- de
	- fr
	---

	# FAAST-Qwen2.5-3B-Instruct

	`faast-Qwen2.5-3B-Instruct` is an extension of `Qwen2.5-3B-Instruct` equipped with the FAAST module. The original Qwen2.5-3B-Instruct parameters are frozen, while only the FAAST readout projections are trained.

	The model is designed for efficient test-time learning through fast weights, enabling adaptation without backpropagation (gradient descent).

	## Model Description

	FAAST augments Qwen2.5-3B-Instruct with fast-weight adaptation modules that support supervised learning during inference. During FAAST pretraining, all backbone LLM parameters remain frozen, and only lightweight FAAST readout projections are optimized.

	This design enables:

	- Test-time learning without backpropagation
	- Efficient adaptation with low memory overhead
	- Fast adaptation to downstream tasks
	- Improved few-shot and full-data performance

	Usage:

	Before running the following code, we need to import the modules from https://github.com/baoguangsheng/faast

	```
	tokenizer = AutoTokenizer.from_pretrained(args.model_path, trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained(args.model_path, trust_remote_code=True)

	fewshot_samples = ['sample 1', 'sample 2', ...]
	inputs = tokenizer(fewshot_samples, return_tensors="pt", padding=True)

	model.reset_projection() # clear existing fast weights
	model.learn(**inputs) # learn new fast weights
	model.generate(...) # do the task using the learned fast weights
	```

	## Training Details

	- Base model: Qwen2.5-3B-Instruct
	- Trainable parameters: FAAST readout projections
	- Frozen parameters: All Qwen2.5-3B-Instruct parameters
	- Pretraining corpus: OpenWebText2
	- Adaptation mechanism: Fast weights / FAAST readout projections


	## Evaluation Results

	### Machine Translation on IWSLT2017

	BLEU scores on IWSLT2017. Bold scores indicate statistical significance at `p < 0.05`.

	#### Qwen2.5-3B-Instruct Backbone

	\| Method \| En-De 1-shot \| En-De full \| De-En 1-shot \| De-En full \| En-Fr 1-shot \| En-Fr full \| Fr-En 1-shot \| Fr-En full \|
	\|---\|---:\|---:\|---:\|---:\|---:\|---:\|---:\|---:\|
	\| Qwen2.5-3B-Instruct (zero-shot) \| - \| 23.22 \| - \| 32.92 \| - \| 30.56 \| - \| 39.24 \|
	\| In-Context Learning \| 23.03 \| - \| 32.33 \| - \| 31.85 \| - \| 38.51 \| - \|
	\| FAAST (Ours) \| 23.35 \| 25.22 \| 33.23 \| 36.40 \| 31.12 \| 35.09 \| 39.46 \| 42.47 \|

	#### Qwen2.5-7B-Instruct Backbone

	\| Method \| En-De 1-shot \| En-De full \| De-En 1-shot \| De-En full \| En-Fr 1-shot \| En-Fr full \| Fr-En 1-shot \| Fr-En full \|
	\|---\|---:\|---:\|---:\|---:\|---:\|---:\|---:\|---:\|
	\| Qwen2.5-7B-Instruct (zero-shot) \| - \| 25.53 \| - \| 34.69 \| - \| 34.82 \| - \| 41.40 \|
	\| In-Context Learning \| 25.39 \| - \| 35.70 \| - \| 35.45 \| - \| 40.86 \| - \|
	\| FAAST (Ours) \| 26.77 \| 27.75 \| 35.34 \| 37.10 \| 35.67 \| 37.08 \| 42.08 \| 43.93 \|

	## Key Features

	- Frozen backbone LLM parameters
	- Lightweight FAAST readout adaptation
	- Test-time learning capability
	- Efficient memory usage
	- Strong few-shot translation performance
	- Compatible with instruction-tuned LLMs

	## Limitations

	- The model inherits the limitations and biases of Qwen2.5-3B-Instruct.
	- Performance may vary across domains and languages not covered during evaluation.
	- FAAST adaptation quality depends on the distribution and quality of test-time examples.
	- The model is primarily intended for research purposes.

	## Citation

	If you use this model, please cite the corresponding [FAAST paper](https://arxiv.org/pdf/2605.04651) or [project](https://github.com/baoguangsheng/faast).

	```bibtex
	@article{bao2026faast,
	title={FAAST: Forward-Only Associative Learning via Closed-Form Fast Weights for Test-Time Supervised Adaptation},
	author={Bao, Guangsheng and Zhang, Hongbo and Cui, Han and Sun, Ke and Zhao, Yanbin and He, Juncai and Zhang, Yue},
	journal={arXiv preprint arXiv:2605.04651},
	year={2026}
	}
	```