Add library_name metadata and improve model card

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +17 -6
README.md CHANGED
@@ -1,23 +1,30 @@
1
  ---
2
- license: apache-2.0
3
  language:
4
  - en
5
  - zh
 
6
  pipeline_tag: text-generation
 
7
  ---
8
 
9
  # DECO-1.2B
10
- This is the 1.2B DECO checkpoint introduced by the paper *DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices*. DECO is an improved version of our previous [BlockFFN](https://arxiv.org/pdf/2507.08771) architecture, with dense-comparable performance given the same budget of total parameters.
11
 
12
- Links: [[Paper](https://arxiv.org/pdf/2605.10933)] [[Code](https://github.com/thunlp/DECO)]
 
 
 
 
 
 
13
 
14
  ### Quick start
15
 
16
- You can load and use this model with `AutoTokenizer` and `AutoModelForCausalLM` from `transformers`.
17
 
18
  ```python
19
  import torch
20
  from transformers import AutoTokenizer, AutoModelForCausalLM
 
21
  model_id = "SparseLLM/DECO-1.2B"
22
  tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
23
  model = AutoModelForCausalLM.from_pretrained(
@@ -25,16 +32,20 @@ model = AutoModelForCausalLM.from_pretrained(
25
  torch_dtype=torch.bfloat16,
26
  trust_remote_code=True,
27
  ).to("cuda").eval()
 
28
  prompt = "Mixture-of-Experts models are useful because"
29
  inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
 
30
  with torch.no_grad():
31
  output = model.generate(**inputs, max_new_tokens=64, do_sample=False)
 
32
  print(tokenizer.decode(output[0], skip_special_tokens=True))
33
  ```
34
 
35
  ### Citation
36
  If you find our work useful for your research, please kindly cite our paper as follows:
37
- ```
 
38
  @article{song2026deco,
39
  title={{DECO}: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices},
40
  author={Chenyang Song, Weilin Zhao, Xu Han, Chaojun Xiao, Yingfa Chen, Zhiyuan Liu},
@@ -42,4 +53,4 @@ If you find our work useful for your research, please kindly cite our paper as f
42
  year={2026},
43
  url={https://arxiv.org/pdf/2605.10933},
44
  }
45
- ```
 
1
  ---
 
2
  language:
3
  - en
4
  - zh
5
+ license: apache-2.0
6
  pipeline_tag: text-generation
7
+ library_name: transformers
8
  ---
9
 
10
  # DECO-1.2B
 
11
 
12
+ This is the 1.2B DECO checkpoint introduced by the paper [DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices](https://huggingface.co/papers/2605.10933).
13
+
14
+ DECO (Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices) is a sparse MoE architecture designed to match the performance of dense Transformers under identical total parameter budgets and training tokens. It is an improved version of the [BlockFFN](https://arxiv.org/pdf/2507.08771) architecture.
15
+
16
+ - **Authors:** Chenyang Song, Weilin Zhao, Xu Han, Chaojun Xiao, Yingfa Chen, Zhiyuan Liu
17
+ - **Paper:** [arXiv:2605.10933](https://huggingface.co/papers/2605.10933)
18
+ - **Code:** [https://github.com/thunlp/DECO](https://github.com/thunlp/DECO)
19
 
20
  ### Quick start
21
 
22
+ You can load and use this model with `AutoTokenizer` and `AutoModelForCausalLM` from `transformers`. Since the model uses a custom architecture, `trust_remote_code=True` is required.
23
 
24
  ```python
25
  import torch
26
  from transformers import AutoTokenizer, AutoModelForCausalLM
27
+
28
  model_id = "SparseLLM/DECO-1.2B"
29
  tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
30
  model = AutoModelForCausalLM.from_pretrained(
 
32
  torch_dtype=torch.bfloat16,
33
  trust_remote_code=True,
34
  ).to("cuda").eval()
35
+
36
  prompt = "Mixture-of-Experts models are useful because"
37
  inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
38
+
39
  with torch.no_grad():
40
  output = model.generate(**inputs, max_new_tokens=64, do_sample=False)
41
+
42
  print(tokenizer.decode(output[0], skip_special_tokens=True))
43
  ```
44
 
45
  ### Citation
46
  If you find our work useful for your research, please kindly cite our paper as follows:
47
+
48
+ ```bibtex
49
  @article{song2026deco,
50
  title={{DECO}: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices},
51
  author={Chenyang Song, Weilin Zhao, Xu Han, Chaojun Xiao, Yingfa Chen, Zhiyuan Liu},
 
53
  year={2026},
54
  url={https://arxiv.org/pdf/2605.10933},
55
  }
56
+ ```