Raincleared commited on
Commit
29959d8
·
verified ·
1 Parent(s): 0c8cfac

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -3
README.md CHANGED
@@ -8,17 +8,23 @@ library_name: transformers
8
  ---
9
 
10
  # DECO-0.5B
11
- This is the 0.5B DECO checkpoint introduced by the paper [DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices](https://huggingface.co/papers/2605.10933). DECO is an improved version of our previous [BlockFFN](https://arxiv.org/pdf/2507.08771) architecture, with dense-comparable performance given the same budget of total parameters.
12
 
13
- Links: [[Paper](https://arxiv.org/pdf/2605.10933)] [[Code](https://github.com/thunlp/DECO)]
 
 
 
 
 
 
14
 
15
  ### Quick start
16
 
17
- You can load and use this model with `AutoTokenizer` and `AutoModelForCausalLM` from `transformers`.
18
 
19
  ```python
20
  import torch
21
  from transformers import AutoTokenizer, AutoModelForCausalLM
 
22
  model_id = "SparseLLM/DECO-0.5B"
23
  tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
24
  model = AutoModelForCausalLM.from_pretrained(
@@ -26,15 +32,19 @@ model = AutoModelForCausalLM.from_pretrained(
26
  torch_dtype=torch.bfloat16,
27
  trust_remote_code=True,
28
  ).to("cuda").eval()
 
29
  prompt = "Mixture-of-Experts models are useful because"
30
  inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
 
31
  with torch.no_grad():
32
  output = model.generate(**inputs, max_new_tokens=64, do_sample=False)
 
33
  print(tokenizer.decode(output[0], skip_special_tokens=True))
34
  ```
35
 
36
  ### Citation
37
  If you find our work useful for your research, please kindly cite our paper as follows:
 
38
  ```bibtex
39
  @article{song2026deco,
40
  title={{DECO}: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices},
 
8
  ---
9
 
10
  # DECO-0.5B
 
11
 
12
+ This is the 0.5B DECO checkpoint introduced by the paper [DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices](https://huggingface.co/papers/2605.10933).
13
+
14
+ DECO (Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices) is a sparse MoE architecture designed to match the performance of dense Transformers under identical total parameter budgets and training tokens. It is an improved version of the [BlockFFN](https://arxiv.org/pdf/2507.08771) architecture.
15
+
16
+ - **Authors:** Chenyang Song, Weilin Zhao, Xu Han, Chaojun Xiao, Yingfa Chen, Zhiyuan Liu
17
+ - **Paper:** [arXiv:2605.10933](https://huggingface.co/papers/2605.10933)
18
+ - **Code:** [https://github.com/thunlp/DECO](https://github.com/thunlp/DECO)
19
 
20
  ### Quick start
21
 
22
+ You can load and use this model with `AutoTokenizer` and `AutoModelForCausalLM` from `transformers`. Since the model uses a custom architecture, `trust_remote_code=True` is required.
23
 
24
  ```python
25
  import torch
26
  from transformers import AutoTokenizer, AutoModelForCausalLM
27
+
28
  model_id = "SparseLLM/DECO-0.5B"
29
  tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
30
  model = AutoModelForCausalLM.from_pretrained(
 
32
  torch_dtype=torch.bfloat16,
33
  trust_remote_code=True,
34
  ).to("cuda").eval()
35
+
36
  prompt = "Mixture-of-Experts models are useful because"
37
  inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
38
+
39
  with torch.no_grad():
40
  output = model.generate(**inputs, max_new_tokens=64, do_sample=False)
41
+
42
  print(tokenizer.decode(output[0], skip_special_tokens=True))
43
  ```
44
 
45
  ### Citation
46
  If you find our work useful for your research, please kindly cite our paper as follows:
47
+
48
  ```bibtex
49
  @article{song2026deco,
50
  title={{DECO}: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices},