gshbao
/

faast-Qwen2.5-7B-Instruct

@@ -1,12 +1,5 @@
 ---
-license: apache-2.0
 base_model: Qwen/Qwen2.5-7B-Instruct
-tags:
-- qwen2.5
-- test-time-learning
-- fast-weights
-- adaptation
-- multilingual
 datasets:
 - OpenWebText2
 - IWSLT2017
@@ -14,6 +7,15 @@ language:
 - en
 - de
 - fr
 ---
 # FAAST-Qwen2.5-7B-Instruct
@@ -22,6 +24,9 @@ language:
 The model is designed for efficient test-time learning through fast weights, enabling adaptation without backpropagation (gradient descent).
 ## Model Description
 FAAST augments Qwen2.5-7B-Instruct with fast-weight adaptation modules that support supervised learning during inference. During FAAST pretraining, all backbone LLM parameters remain frozen, and only lightweight FAAST readout projections are optimized.
@@ -33,6 +38,26 @@ This design enables:
 - Fast adaptation to downstream tasks
 - Improved few-shot and full-data performance
 ## Training Details
 - **Base model:** Qwen2.5-7B-Instruct
@@ -48,14 +73,6 @@ This design enables:
 BLEU scores on IWSLT2017. Bold scores indicate statistical significance at `p < 0.05`.
-#### Qwen2.5-3B-Instruct Backbone
-| Method | En-De 1-shot | En-De full | De-En 1-shot | De-En full | En-Fr 1-shot | En-Fr full | Fr-En 1-shot | Fr-En full |
-|---|---:|---:|---:|---:|---:|---:|---:|---:|
-| Qwen2.5-3B-Instruct (zero-shot) | - | 23.22 | - | 32.92 | - | 30.56 | - | 39.24 |
-| In-Context Learning | 23.03 | - | 32.33 | - | 31.85 | - | 38.51 | - |
-| **FAAST (Ours)** | 23.35 | **25.22** | **33.23** | **36.40** | 31.12 | **35.09** | **39.46** | **42.47** |
 #### Qwen2.5-7B-Instruct Backbone
 | Method | En-De 1-shot | En-De full | De-En 1-shot | De-En full | En-Fr 1-shot | En-Fr full | Fr-En 1-shot | Fr-En full |
@@ -82,12 +99,12 @@ BLEU scores on IWSLT2017. Bold scores indicate statistical significance at `p <
 ## Citation
-If you use this model, please cite the corresponding [FAAST paper](https://arxiv.org/pdf/2605.04651) or [project](https://github.com/baoguangsheng/faast).
 ```bibtex
 @article{bao2026faast,
   title={FAAST: Forward-Only Associative Learning via Closed-Form Fast Weights for Test-Time Supervised Adaptation},
-  author={Bao, Guangsheng and Zhang, Hongbo and Cui, Han and Sun, Ke and Zhao, Yanbin and He, Juncai and Zhang, Yue},
   journal={arXiv preprint arXiv:2605.04651},
   year={2026}
 }

 ---
 base_model: Qwen/Qwen2.5-7B-Instruct
 datasets:
 - OpenWebText2
 - IWSLT2017
 - en
 - de
 - fr
+license: apache-2.0
+library_name: transformers
+pipeline_tag: text-generation
+tags:
+- qwen2.5
+- test-time-learning
+- fast-weights
+- adaptation
+- multilingual
 ---
 # FAAST-Qwen2.5-7B-Instruct
 The model is designed for efficient test-time learning through fast weights, enabling adaptation without backpropagation (gradient descent).
+- **Paper:** [FAAST: Forward-Only Associative Learning via Closed-Form Fast Weights for Test-Time Supervised Adaptation](https://huggingface.co/papers/2605.04651)
+- **Repository:** [https://github.com/baoguangsheng/faast](https://github.com/baoguangsheng/faast)
 ## Model Description
 FAAST augments Qwen2.5-7B-Instruct with fast-weight adaptation modules that support supervised learning during inference. During FAAST pretraining, all backbone LLM parameters remain frozen, and only lightweight FAAST readout projections are optimized.
 - Fast adaptation to downstream tasks
 - Improved few-shot and full-data performance
+## Sample Usage
+This model requires `trust_remote_code=True` to load the custom FAAST architecture.
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+model_path = "gshbao/faast-Qwen2.5-7B-Instruct"
+tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)
+# Examples to learn at test time
+fewshot_samples = ['sample 1', 'sample 2']
+inputs = tokenizer(fewshot_samples, return_tensors="pt", padding=True)
+model.reset_projection() # clear existing fast weights
+model.learn(**inputs)  # learn new fast weights analytically in a single pass
+# model.generate(...)  # perform the task using the learned fast weights
+```
 ## Training Details
 - **Base model:** Qwen2.5-7B-Instruct
 BLEU scores on IWSLT2017. Bold scores indicate statistical significance at `p < 0.05`.
 #### Qwen2.5-7B-Instruct Backbone
 | Method | En-De 1-shot | En-De full | De-En 1-shot | De-En full | En-Fr 1-shot | En-Fr full | Fr-En 1-shot | Fr-En full |
 ## Citation
+If you use this model, please cite the corresponding paper:
 ```bibtex
 @article{bao2026faast,
   title={FAAST: Forward-Only Associative Learning via Closed-Form Fast Weights for Test-Time Supervised Adaptation},
+  author={Bao, Guangsheng and Zhang, Hongbo and Cui, Han North and Sun, Ke and Zhao, Yanbin and He, Juncai and Zhang, Yue},
   journal={arXiv preprint arXiv:2605.04651},
   year={2026}
 }