How to use from
vLLM
Install from pip and serve model
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "allenai/EMO"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "allenai/EMO",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'
Use Docker
docker model run hf.co/allenai/EMO
Quick Links

EMO: Pretraining Mixture of Experts for Emergent Modularity

This page is an index for the model checkpoints released alongside EMO: Pretraining Mixture of Experts for Emergent Modularity. The repository at allenai/EMO does not host model weights — pick the checkpoint you want from the table below.

Released models

Main release

Model Description
allenai/Emo_1b14b_1T EMO — 1B-active / 14B-total MoE pretrained on 1T tokens + 50B-token midtraining anneal. The main model from the paper.

Ablation: EMO at smaller scale

Model Description
allenai/Emo_1b14b_130B EMO trained on 130B tokens (Table 1 / Figure 11 ablation). Not midtrained.

Architecture-matched standard MoE baselines

These share architecture and data with the EMO models above; only the training objective differs (no document-level expert pool constraint).

Model Description
allenai/StdMoE_1b14b_1T Standard MoE — Reg. MoE at 1T tokens in the paper. Same setup as Emo_1b14b_1T.
allenai/StdMoE_1b14b_130B Standard MoE — Reg. MoE at 130B tokens. Same setup as Emo_1b14b_130B.

Memory-matched baselines (Figure 1)

Smaller models trained from scratch at fixed memory budgets, used as comparison points for EMO expert subsets.

Model Description
allenai/Dense_1b_130B Dense @ 8 — 1B dense decoder-only Transformer trained on 130B tokens. Active-parameter-matched with 8-expert subsets of the larger EMO/StdMoE models.
allenai/StdMoE_1b4b_130B Reg. MoE @ 32 — 1B-active / 4B-total standard MoE (32 routed experts) trained from scratch on 130B tokens. Memory-matched with 32-expert subsets.

EMO-anneal ablation (Appendix B.4)

Tests whether modularity can be induced after pretraining by annealing a standard MoE under the EMO objective.

Model Description
allenai/StdMoE_1b14b_1T_Preanneal Standard MoE pretrained on 1T tokens, no annealing. The starting point for the EMO-anneal experiment.
allenai/StdMoE_1b14b_1T_EmoAnnealed EMO-anneal — StdMoE_1b14b_1T_Preanneal annealed for 50B tokens under the EMO document-level expert pool objective.

Quick start

All checkpoints require trust_remote_code=True since they use custom modeling code from the ryanyxw/transformers fork. Replace model_id with the checkpoint you want from the table above.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "allenai/Emo_1b14b_1T"  # main EMO release
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)

inputs = tokenizer(["Language modeling is "], return_tensors="pt", return_token_type_ids=False)
out = model.generate(**inputs, max_new_tokens=100, do_sample=True, temperature=1.0, top_p=0.7)
print(tokenizer.batch_decode(out, skip_special_tokens=True)[0])

Citation

@article{wang2026emo,
  title  = {EMO: Pretraining Mixture of Experts for Emergent Modularity},
  author = {Wang, Ryan and Bhagia, Akshita and Min, Sewon},
  year   = {2026},
  url    = {https://arxiv.org/abs/2605.06663}
}

Links

Downloads last month
10
Safetensors
Model size
14B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train allenai/EMO

Paper for allenai/EMO