---
license: apache-2.0
base_model: Qwen/Qwen2.5-7B-Instruct
tags:
- qwen2.5
- test-time-learning
- fast-weights
- adaptation
- multilingual
datasets:
- OpenWebText2
- IWSLT2017
language:
- en
- de
- fr
---

# FAAST-Qwen2.5-7B-Instruct

`faast-Qwen2.5-7B-Instruct` is an extension of `Qwen2.5-7B-Instruct` equipped with the FAAST module. The original Qwen2.5-7B-Instruct parameters are frozen, while only the FAAST readout projections are trained.

The model is designed for efficient test-time learning through fast weights, enabling adaptation without backpropagation (gradient descent).

## Model Description

FAAST augments Qwen2.5-7B-Instruct with fast-weight adaptation modules that support supervised learning during inference. During FAAST pretraining, all backbone LLM parameters remain frozen, and only lightweight FAAST readout projections are optimized.

This design enables:

- Test-time learning without backpropagation
- Efficient adaptation with low memory overhead
- Fast adaptation to downstream tasks
- Improved few-shot and full-data performance

Usage: 

Before running the following code, we need to import the modules from https://github.com/baoguangsheng/faast 

```
tokenizer = AutoTokenizer.from_pretrained(args.model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(args.model_path, trust_remote_code=True)

fewshot_samples = ['sample 1', 'sample 2', ...]
inputs = tokenizer(fewshot_samples, return_tensors="pt", padding=True)

model.reset_projection() # clear existing fast weights
model.learn(**inputs)  # learn new fast weights
model.generate(...)  # do the task using the learned fast weights
```

## Training Details

- **Base model:** Qwen2.5-7B-Instruct
- **Trainable parameters:** FAAST readout projections
- **Frozen parameters:** All Qwen2.5-7B-Instruct parameters
- **Pretraining corpus:** OpenWebText2
- **Adaptation mechanism:** Fast weights / FAAST readout projections


## Evaluation Results

### Machine Translation on IWSLT2017

BLEU scores on IWSLT2017. Bold scores indicate statistical significance at `p < 0.05`.

#### Qwen2.5-3B-Instruct Backbone

| Method | En-De 1-shot | En-De full | De-En 1-shot | De-En full | En-Fr 1-shot | En-Fr full | Fr-En 1-shot | Fr-En full |
|---|---:|---:|---:|---:|---:|---:|---:|---:|
| Qwen2.5-3B-Instruct (zero-shot) | - | 23.22 | - | 32.92 | - | 30.56 | - | 39.24 |
| In-Context Learning | 23.03 | - | 32.33 | - | 31.85 | - | 38.51 | - |
| **FAAST (Ours)** | 23.35 | **25.22** | **33.23** | **36.40** | 31.12 | **35.09** | **39.46** | **42.47** |

#### Qwen2.5-7B-Instruct Backbone

| Method | En-De 1-shot | En-De full | De-En 1-shot | De-En full | En-Fr 1-shot | En-Fr full | Fr-En 1-shot | Fr-En full |
|---|---:|---:|---:|---:|---:|---:|---:|---:|
| Qwen2.5-7B-Instruct (zero-shot) | - | 25.53 | - | 34.69 | - | 34.82 | - | 41.40 |
| In-Context Learning | 25.39 | - | 35.70 | - | 35.45 | - | 40.86 | - |
| **FAAST (Ours)** | **26.77** | **27.75** | 35.34 | **37.10** | 35.67 | **37.08** | **42.08** | **43.93** |

## Key Features

- Frozen backbone LLM parameters
- Lightweight FAAST readout adaptation
- Test-time learning capability
- Efficient memory usage
- Strong few-shot translation performance
- Compatible with instruction-tuned LLMs

## Limitations

- The model inherits the limitations and biases of Qwen2.5-7B-Instruct.
- Performance may vary across domains and languages not covered during evaluation.
- FAAST adaptation quality depends on the distribution and quality of test-time examples.
- The model is primarily intended for research purposes.

## Citation

If you use this model, please cite the corresponding [FAAST paper](https://arxiv.org/pdf/2605.04651) or [project](https://github.com/baoguangsheng/faast).

```bibtex
@article{bao2026faast,
  title={FAAST: Forward-Only Associative Learning via Closed-Form Fast Weights for Test-Time Supervised Adaptation},
  author={Bao, Guangsheng and Zhang, Hongbo and Cui, Han and Sun, Ke and Zhao, Yanbin and He, Juncai and Zhang, Yue},
  journal={arXiv preprint arXiv:2605.04651},
  year={2026}
}
```