--- language: - en - zh license: apache-2.0 pipeline_tag: text-generation library_name: transformers --- # DECO-0.5B This is the 0.5B DECO checkpoint introduced by the paper [DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices](https://huggingface.co/papers/2605.10933). DECO (Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices) is a sparse MoE architecture designed to match the performance of dense Transformers under identical total parameter budgets and training tokens. It is an improved version of the [BlockFFN](https://arxiv.org/pdf/2507.08771) architecture. - **Authors:** Chenyang Song, Weilin Zhao, Xu Han, Chaojun Xiao, Yingfa Chen, Zhiyuan Liu - **Paper:** [arXiv:2605.10933](https://huggingface.co/papers/2605.10933) - **Code:** [https://github.com/thunlp/DECO](https://github.com/thunlp/DECO) ### Quick start You can load and use this model with `AutoTokenizer` and `AutoModelForCausalLM` from `transformers`. Since the model uses a custom architecture, `trust_remote_code=True` is required. ```python import torch from transformers import AutoTokenizer, AutoModelForCausalLM model_id = "SparseLLM/DECO-0.5B" tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, trust_remote_code=True, ).to("cuda").eval() prompt = "Mixture-of-Experts models are useful because" inputs = tokenizer(prompt, return_tensors="pt").to("cuda") with torch.no_grad(): output = model.generate(**inputs, max_new_tokens=64, do_sample=False) print(tokenizer.decode(output[0], skip_special_tokens=True)) ``` ### Citation If you find our work useful for your research, please kindly cite our paper as follows: ```bibtex @article{song2026deco, title={{DECO}: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices}, author={Chenyang Song, Weilin Zhao, Xu Han, Chaojun Xiao, Yingfa Chen, Zhiyuan Liu}, journal={arXiv preprint arXiv:2605.10933}, year={2026}, url={https://arxiv.org/pdf/2605.10933}, } ```