metadata
license: apache-2.0
language:
- en
- zh
pipeline_tag: text-generation
DECO-1.2B
This is the 1.2B DECO checkpoint introduced by the paper DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices. DECO is an improved version of our previous BlockFFN architecture, with dense-comparable performance given the same budget of total parameters.
Quick start
You can load and use this model with AutoTokenizer and AutoModelForCausalLM from transformers.
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "SparseLLM/DECO-1.2B"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
).to("cuda").eval()
prompt = "Mixture-of-Experts models are useful because"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
with torch.no_grad():
output = model.generate(**inputs, max_new_tokens=64, do_sample=False)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Citation
If you find our work useful for your research, please kindly cite our paper as follows:
@article{song2026deco,
title={{DECO}: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices},
author={Chenyang Song, Weilin Zhao, Xu Han, Chaojun Xiao, Yingfa Chen, Zhiyuan Liu},
journal={arXiv preprint arXiv:2605.10933},
year={2026},
url={https://arxiv.org/pdf/2605.10933},
}