| --- |
| license: apache-2.0 |
| language: |
| - en |
| - zh |
| pipeline_tag: text-generation |
| --- |
| |
| # DECO-0.2B |
| This is the 0.2B DECO checkpoint introduced by the paper *DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices*. DECO is an improved version of our previous [BlockFFN](https://arxiv.org/pdf/2507.08771) architecture, with dense-comparable performance given the same budget of total parameters. |
|
|
| Links: [[Paper](https://arxiv.org/pdf/2605.10933)] [[Code](https://github.com/thunlp/DECO)] |
|
|
| ### Quick start |
|
|
| You can load and use this model with `AutoTokenizer` and `AutoModelForCausalLM` from `transformers`. |
|
|
| ```python |
| import torch |
| from transformers import AutoTokenizer, AutoModelForCausalLM |
| model_id = "SparseLLM/DECO-0.2B" |
| tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) |
| model = AutoModelForCausalLM.from_pretrained( |
| model_id, |
| torch_dtype=torch.bfloat16, |
| trust_remote_code=True, |
| ).to("cuda").eval() |
| prompt = "Mixture-of-Experts models are useful because" |
| inputs = tokenizer(prompt, return_tensors="pt").to("cuda") |
| with torch.no_grad(): |
| output = model.generate(**inputs, max_new_tokens=64, do_sample=False) |
| print(tokenizer.decode(output[0], skip_special_tokens=True)) |
| ``` |
|
|
| ### Citation |
| If you find our work useful for your research, please kindly cite our paper as follows: |
| ``` |
| @article{song2026deco, |
| title={{DECO}: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices}, |
| author={Chenyang Song, Weilin Zhao, Xu Han, Chaojun Xiao, Yingfa Chen, Zhiyuan Liu}, |
| journal={arXiv preprint arXiv:2605.10933}, |
| year={2026}, |
| url={https://arxiv.org/pdf/2605.10933}, |
| } |
| ``` |
|
|