Files changed (1) hide show
  1. README.md +49 -0
README.md ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ language:
4
+ - en
5
+ library_name: transformers
6
+ pipeline_tag: text-generation
7
+ tags:
8
+ - moe
9
+ - mixture-of-experts
10
+ - modularity
11
+ datasets:
12
+ - allenai/OLMoE-mix-0924
13
+ ---
14
+
15
+ # Emo_1b14b_1T
16
+
17
+ The main release of **EMO** from [EMO: Pretraining Mixture of Experts for Emergent Modularity](https://arxiv.org/abs/2605.06663) — referred to as **EMO** (1T tokens, midtrained) in the paper.
18
+
19
+ 1B-active / 14B-total parameter Mixture-of-Experts model (128 experts: 127 routed + 1 shared, k=8 active per token) pretrained on 1T tokens of the OLMoE pretraining mix and annealed under the EMO objective for an additional 50B tokens. Tokens within the same document are constrained to route through a shared pool of experts during training, producing expert subsets that can be deployed in isolation for specific domains with minimal performance degradation.
20
+
21
+ ## Usage
22
+
23
+ ```python
24
+ from transformers import AutoModelForCausalLM, AutoTokenizer
25
+
26
+ model_id = "allenai/Emo_1b14b_1T"
27
+ model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
28
+ tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
29
+
30
+ inputs = tokenizer(["Language modeling is "], return_tensors="pt", return_token_type_ids=False)
31
+ out = model.generate(**inputs, max_new_tokens=100, do_sample=True, temperature=1.0, top_p=0.7)
32
+ print(tokenizer.batch_decode(out, skip_special_tokens=True)[0])
33
+ ```
34
+
35
+ ## Citation
36
+
37
+ ```bibtex
38
+ @article{wang2026emo,
39
+ title = {EMO: Pretraining Mixture of Experts for Emergent Modularity},
40
+ author = {Wang, Ryan and Bhagia, Akshita and Min, Sewon},
41
+ year = {2026},
42
+ url = {https://arxiv.org/abs/2605.06663}
43
+ }
44
+ ```
45
+
46
+ ## Links
47
+
48
+ - Paper: https://arxiv.org/abs/2605.06663
49
+ - Code: https://github.com/allenai/EMO