--- language: - en - zh license: apache-2.0 pipeline_tag: text-generation library_name: transformers --- # DECO-0.1B This is the 0.1B DECO checkpoint introduced by the paper *DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices*. DECO is an improved version of our previous [BlockFFN](https://arxiv.org/pdf/2507.08771) architecture, with dense-comparable performance given the same budget of total parameters. Links: [[Paper](https://arxiv.org/pdf/2605.10933)] [[Code](https://github.com/thunlp/DECO)] ### Quick start You can load and use this model with `AutoTokenizer` and `AutoModelForCausalLM` from `transformers`. ```python import torch from transformers import AutoTokenizer, AutoModelForCausalLM model_id = "SparseLLM/DECO-0.1B" tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, trust_remote_code=True, ).to("cuda").eval() prompt = "Mixture-of-Experts models are useful because" inputs = tokenizer(prompt, return_tensors="pt").to("cuda") with torch.no_grad(): output = model.generate(**inputs, max_new_tokens=64, do_sample=False) print(tokenizer.decode(output[0], skip_special_tokens=True)) ``` ### Citation If you find our work useful for your research, please kindly cite our paper as follows: ```bibtex @article{song2026deco, title={{DECO}: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices}, author={Chenyang Song, Weilin Zhao, Xu Han, Chaojun Xiao, Yingfa Chen, Zhiyuan Liu}, journal={arXiv preprint arXiv:2605.10933}, year={2026}, url={https://arxiv.org/pdf/2605.10933}, } ```