Harley-ml commited on
Commit
d0fdd1a
·
verified ·
1 Parent(s): de5743e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -0
README.md CHANGED
@@ -34,8 +34,15 @@ Dillion is a 1.2M parameter language model trained on ~9B tokens of FineWeb-edu.
34
  Our goal was to make one of the best sub-1.5M parameter LMs through depth (12 layers) and huge overtraining (about 8900 tokens per parameter).
35
  Dillion beats or ties with models much larger than itself such as [SupraMini-v4-2M](https://huggingface.co/SupraLabs/Supra-Mini-v4-2M) and [Tenete-8M](https://huggingface.co/Harley-ml/Tenete-8M).
36
 
 
 
 
 
 
 
37
  ## Architecture
38
 
 
39
  Dillion-1.2M uses the Qwen3.5 architecture.
40
 
41
  | Parameter | Value |
 
34
  Our goal was to make one of the best sub-1.5M parameter LMs through depth (12 layers) and huge overtraining (about 8900 tokens per parameter).
35
  Dillion beats or ties with models much larger than itself such as [SupraMini-v4-2M](https://huggingface.co/SupraLabs/Supra-Mini-v4-2M) and [Tenete-8M](https://huggingface.co/Harley-ml/Tenete-8M).
36
 
37
+ ### Why "Dillion"?
38
+
39
+ I was scrolling through Hugging Face and saw GPT-2, the smallest variant. I looked at its download count and saw 16 million. My brain, for some random reason, hallucinated “Dillion.” So I decided to call my next model, no matter the task or size, Dillion.
40
+
41
+ I decided to dig a bit deeper, and after a quick Google Searc, I found that “Dillion” is an alternate spelling of the Irish name Dillon, which translates to “loyal” or “faithful.” But let me tell you, this model ain’t loyal or faithful; actually, it probably doesn’t even know what those words mean.
42
+
43
  ## Architecture
44
 
45
+
46
  Dillion-1.2M uses the Qwen3.5 architecture.
47
 
48
  | Parameter | Value |