Update model card: add library_name, pipeline_tag, and sample usage

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +34 -17
README.md CHANGED
@@ -1,12 +1,5 @@
1
  ---
2
- license: apache-2.0
3
  base_model: Qwen/Qwen2.5-7B-Instruct
4
- tags:
5
- - qwen2.5
6
- - test-time-learning
7
- - fast-weights
8
- - adaptation
9
- - multilingual
10
  datasets:
11
  - OpenWebText2
12
  - IWSLT2017
@@ -14,6 +7,15 @@ language:
14
  - en
15
  - de
16
  - fr
 
 
 
 
 
 
 
 
 
17
  ---
18
 
19
  # FAAST-Qwen2.5-7B-Instruct
@@ -22,6 +24,9 @@ language:
22
 
23
  The model is designed for efficient test-time learning through fast weights, enabling adaptation without backpropagation (gradient descent).
24
 
 
 
 
25
  ## Model Description
26
 
27
  FAAST augments Qwen2.5-7B-Instruct with fast-weight adaptation modules that support supervised learning during inference. During FAAST pretraining, all backbone LLM parameters remain frozen, and only lightweight FAAST readout projections are optimized.
@@ -33,6 +38,26 @@ This design enables:
33
  - Fast adaptation to downstream tasks
34
  - Improved few-shot and full-data performance
35
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36
  ## Training Details
37
 
38
  - **Base model:** Qwen2.5-7B-Instruct
@@ -48,14 +73,6 @@ This design enables:
48
 
49
  BLEU scores on IWSLT2017. Bold scores indicate statistical significance at `p < 0.05`.
50
 
51
- #### Qwen2.5-3B-Instruct Backbone
52
-
53
- | Method | En-De 1-shot | En-De full | De-En 1-shot | De-En full | En-Fr 1-shot | En-Fr full | Fr-En 1-shot | Fr-En full |
54
- |---|---:|---:|---:|---:|---:|---:|---:|---:|
55
- | Qwen2.5-3B-Instruct (zero-shot) | - | 23.22 | - | 32.92 | - | 30.56 | - | 39.24 |
56
- | In-Context Learning | 23.03 | - | 32.33 | - | 31.85 | - | 38.51 | - |
57
- | **FAAST (Ours)** | 23.35 | **25.22** | **33.23** | **36.40** | 31.12 | **35.09** | **39.46** | **42.47** |
58
-
59
  #### Qwen2.5-7B-Instruct Backbone
60
 
61
  | Method | En-De 1-shot | En-De full | De-En 1-shot | De-En full | En-Fr 1-shot | En-Fr full | Fr-En 1-shot | Fr-En full |
@@ -82,12 +99,12 @@ BLEU scores on IWSLT2017. Bold scores indicate statistical significance at `p <
82
 
83
  ## Citation
84
 
85
- If you use this model, please cite the corresponding [FAAST paper](https://arxiv.org/pdf/2605.04651) or [project](https://github.com/baoguangsheng/faast).
86
 
87
  ```bibtex
88
  @article{bao2026faast,
89
  title={FAAST: Forward-Only Associative Learning via Closed-Form Fast Weights for Test-Time Supervised Adaptation},
90
- author={Bao, Guangsheng and Zhang, Hongbo and Cui, Han and Sun, Ke and Zhao, Yanbin and He, Juncai and Zhang, Yue},
91
  journal={arXiv preprint arXiv:2605.04651},
92
  year={2026}
93
  }
 
1
  ---
 
2
  base_model: Qwen/Qwen2.5-7B-Instruct
 
 
 
 
 
 
3
  datasets:
4
  - OpenWebText2
5
  - IWSLT2017
 
7
  - en
8
  - de
9
  - fr
10
+ license: apache-2.0
11
+ library_name: transformers
12
+ pipeline_tag: text-generation
13
+ tags:
14
+ - qwen2.5
15
+ - test-time-learning
16
+ - fast-weights
17
+ - adaptation
18
+ - multilingual
19
  ---
20
 
21
  # FAAST-Qwen2.5-7B-Instruct
 
24
 
25
  The model is designed for efficient test-time learning through fast weights, enabling adaptation without backpropagation (gradient descent).
26
 
27
+ - **Paper:** [FAAST: Forward-Only Associative Learning via Closed-Form Fast Weights for Test-Time Supervised Adaptation](https://huggingface.co/papers/2605.04651)
28
+ - **Repository:** [https://github.com/baoguangsheng/faast](https://github.com/baoguangsheng/faast)
29
+
30
  ## Model Description
31
 
32
  FAAST augments Qwen2.5-7B-Instruct with fast-weight adaptation modules that support supervised learning during inference. During FAAST pretraining, all backbone LLM parameters remain frozen, and only lightweight FAAST readout projections are optimized.
 
38
  - Fast adaptation to downstream tasks
39
  - Improved few-shot and full-data performance
40
 
41
+ ## Sample Usage
42
+
43
+ This model requires `trust_remote_code=True` to load the custom FAAST architecture.
44
+
45
+ ```python
46
+ from transformers import AutoTokenizer, AutoModelForCausalLM
47
+
48
+ model_path = "gshbao/faast-Qwen2.5-7B-Instruct"
49
+ tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
50
+ model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)
51
+
52
+ # Examples to learn at test time
53
+ fewshot_samples = ['sample 1', 'sample 2']
54
+ inputs = tokenizer(fewshot_samples, return_tensors="pt", padding=True)
55
+
56
+ model.reset_projection() # clear existing fast weights
57
+ model.learn(**inputs) # learn new fast weights analytically in a single pass
58
+ # model.generate(...) # perform the task using the learned fast weights
59
+ ```
60
+
61
  ## Training Details
62
 
63
  - **Base model:** Qwen2.5-7B-Instruct
 
73
 
74
  BLEU scores on IWSLT2017. Bold scores indicate statistical significance at `p < 0.05`.
75
 
 
 
 
 
 
 
 
 
76
  #### Qwen2.5-7B-Instruct Backbone
77
 
78
  | Method | En-De 1-shot | En-De full | De-En 1-shot | De-En full | En-Fr 1-shot | En-Fr full | Fr-En 1-shot | Fr-En full |
 
99
 
100
  ## Citation
101
 
102
+ If you use this model, please cite the corresponding paper:
103
 
104
  ```bibtex
105
  @article{bao2026faast,
106
  title={FAAST: Forward-Only Associative Learning via Closed-Form Fast Weights for Test-Time Supervised Adaptation},
107
+ author={Bao, Guangsheng and Zhang, Hongbo and Cui, Han North and Sun, Ke and Zhao, Yanbin and He, Juncai and Zhang, Yue},
108
  journal={arXiv preprint arXiv:2605.04651},
109
  year={2026}
110
  }