Harley-ml
/

Dillion-1.2M

Text Generation

Model card Files Files and versions

Harley-ml commited on 4 days ago

Commit

d0fdd1a

·

verified ·

1 Parent(s): de5743e

Update README.md

Files changed (1) hide show

README.md +7 -0

README.md CHANGED Viewed

@@ -34,8 +34,15 @@ Dillion is a 1.2M parameter language model trained on ~9B tokens of FineWeb-edu.
 Our goal was to make one of the best sub-1.5M parameter LMs through depth (12 layers) and huge overtraining (about 8900 tokens per parameter).
 Dillion beats or ties with models much larger than itself such as [SupraMini-v4-2M](https://huggingface.co/SupraLabs/Supra-Mini-v4-2M) and [Tenete-8M](https://huggingface.co/Harley-ml/Tenete-8M).
 ## Architecture
 Dillion-1.2M uses the Qwen3.5 architecture.
 | Parameter                 | Value            |

 Our goal was to make one of the best sub-1.5M parameter LMs through depth (12 layers) and huge overtraining (about 8900 tokens per parameter).
 Dillion beats or ties with models much larger than itself such as [SupraMini-v4-2M](https://huggingface.co/SupraLabs/Supra-Mini-v4-2M) and [Tenete-8M](https://huggingface.co/Harley-ml/Tenete-8M).
+### Why "Dillion"?
+I was scrolling through Hugging Face and saw GPT-2, the smallest variant. I looked at its download count and saw 16 million. My brain, for some random reason, hallucinated “Dillion.” So I decided to call my next model, no matter the task or size, Dillion.
+I decided to dig a bit deeper, and after a quick Google Searc, I found that “Dillion” is an alternate spelling of the Irish name Dillon, which translates to “loyal” or “faithful.” But let me tell you, this model ain’t loyal or faithful; actually, it probably doesn’t even know what those words mean.
 ## Architecture
 Dillion-1.2M uses the Qwen3.5 architecture.
 | Parameter                 | Value            |