| --- |
| license: apache-2.0 |
| base_model: Qwen/Qwen2.5-7B-Instruct |
| tags: |
| - qwen2.5 |
| - test-time-learning |
| - fast-weights |
| - adaptation |
| - multilingual |
| datasets: |
| - OpenWebText2 |
| - IWSLT2017 |
| language: |
| - en |
| - de |
| - fr |
| --- |
| |
| # FAAST-Qwen2.5-7B-Instruct |
|
|
| `faast-Qwen2.5-7B-Instruct` is an extension of `Qwen2.5-7B-Instruct` equipped with the FAAST module. The original Qwen2.5-7B-Instruct parameters are frozen, while only the FAAST readout projections are trained. |
|
|
| The model is designed for efficient test-time learning through fast weights, enabling adaptation without backpropagation (gradient descent). |
|
|
| ## Model Description |
|
|
| FAAST augments Qwen2.5-7B-Instruct with fast-weight adaptation modules that support supervised learning during inference. During FAAST pretraining, all backbone LLM parameters remain frozen, and only lightweight FAAST readout projections are optimized. |
|
|
| This design enables: |
|
|
| - Test-time learning without backpropagation |
| - Efficient adaptation with low memory overhead |
| - Fast adaptation to downstream tasks |
| - Improved few-shot and full-data performance |
|
|
| Usage: |
|
|
| Before running the following code, we need to import the modules from https://github.com/baoguangsheng/faast |
|
|
| ``` |
| tokenizer = AutoTokenizer.from_pretrained(args.model_path, trust_remote_code=True) |
| model = AutoModelForCausalLM.from_pretrained(args.model_path, trust_remote_code=True) |
| |
| fewshot_samples = ['sample 1', 'sample 2', ...] |
| inputs = tokenizer(fewshot_samples, return_tensors="pt", padding=True) |
| |
| model.reset_projection() # clear existing fast weights |
| model.learn(**inputs) # learn new fast weights |
| model.generate(...) # do the task using the learned fast weights |
| ``` |
|
|
| ## Training Details |
|
|
| - **Base model:** Qwen2.5-7B-Instruct |
| - **Trainable parameters:** FAAST readout projections |
| - **Frozen parameters:** All Qwen2.5-7B-Instruct parameters |
| - **Pretraining corpus:** OpenWebText2 |
| - **Adaptation mechanism:** Fast weights / FAAST readout projections |
|
|
|
|
| ## Evaluation Results |
|
|
| ### Machine Translation on IWSLT2017 |
|
|
| BLEU scores on IWSLT2017. Bold scores indicate statistical significance at `p < 0.05`. |
|
|
| #### Qwen2.5-3B-Instruct Backbone |
|
|
| | Method | En-De 1-shot | En-De full | De-En 1-shot | De-En full | En-Fr 1-shot | En-Fr full | Fr-En 1-shot | Fr-En full | |
| |---|---:|---:|---:|---:|---:|---:|---:|---:| |
| | Qwen2.5-3B-Instruct (zero-shot) | - | 23.22 | - | 32.92 | - | 30.56 | - | 39.24 | |
| | In-Context Learning | 23.03 | - | 32.33 | - | 31.85 | - | 38.51 | - | |
| | **FAAST (Ours)** | 23.35 | **25.22** | **33.23** | **36.40** | 31.12 | **35.09** | **39.46** | **42.47** | |
|
|
| #### Qwen2.5-7B-Instruct Backbone |
|
|
| | Method | En-De 1-shot | En-De full | De-En 1-shot | De-En full | En-Fr 1-shot | En-Fr full | Fr-En 1-shot | Fr-En full | |
| |---|---:|---:|---:|---:|---:|---:|---:|---:| |
| | Qwen2.5-7B-Instruct (zero-shot) | - | 25.53 | - | 34.69 | - | 34.82 | - | 41.40 | |
| | In-Context Learning | 25.39 | - | 35.70 | - | 35.45 | - | 40.86 | - | |
| | **FAAST (Ours)** | **26.77** | **27.75** | 35.34 | **37.10** | 35.67 | **37.08** | **42.08** | **43.93** | |
|
|
| ## Key Features |
|
|
| - Frozen backbone LLM parameters |
| - Lightweight FAAST readout adaptation |
| - Test-time learning capability |
| - Efficient memory usage |
| - Strong few-shot translation performance |
| - Compatible with instruction-tuned LLMs |
|
|
| ## Limitations |
|
|
| - The model inherits the limitations and biases of Qwen2.5-7B-Instruct. |
| - Performance may vary across domains and languages not covered during evaluation. |
| - FAAST adaptation quality depends on the distribution and quality of test-time examples. |
| - The model is primarily intended for research purposes. |
|
|
| ## Citation |
|
|
| If you use this model, please cite the corresponding [FAAST paper](https://arxiv.org/pdf/2605.04651) or [project](https://github.com/baoguangsheng/faast). |
|
|
| ```bibtex |
| @article{bao2026faast, |
| title={FAAST: Forward-Only Associative Learning via Closed-Form Fast Weights for Test-Time Supervised Adaptation}, |
| author={Bao, Guangsheng and Zhang, Hongbo and Cui, Han and Sun, Ke and Zhao, Yanbin and He, Juncai and Zhang, Yue}, |
| journal={arXiv preprint arXiv:2605.04651}, |
| year={2026} |
| } |
| ``` |