Update README.md
Browse files
README.md
CHANGED
|
@@ -15,10 +15,22 @@ tags:
|
|
| 15 |
---
|
| 16 |
|
| 17 |
# CinnabarLM
|
|
|
|
| 18 |
|
| 19 |
-
|
|
|
|
| 20 |
|
| 21 |
# Model Configurations
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
|
| 23 |
# Training Configurations
|
| 24 |
| Hyperparameter | Value |
|
|
@@ -33,4 +45,9 @@ CinnabarLM is a tiny, 4M-parameter LLM trained for ~33 minutes on a T4 GPU (on C
|
|
| 33 |
|
| 34 |
# Limitations
|
| 35 |
* **Not Instruction-Tuned:** It's only a base model, so it only completes text.
|
| 36 |
-
* **English-Only:** It's trained on English data (FineWeb), it's NOT multilingual.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
---
|
| 16 |
|
| 17 |
# CinnabarLM
|
| 18 |
+
CinnabarLM is a tiny, 4M-parameter LLM trained for ~33 minutes on a T4 GPU (on Colab)! It's only 16 MB in size!
|
| 19 |
|
| 20 |
+
# Why?
|
| 21 |
+
Because it's a good idea to make tiny LLMs. Some people already did with [MicroLM](https://huggingface.co/CromIA/MicroLM-1M), [Spark 4 5M](https://huggingface.co/LH-Tech-AI/Spark-5M-Base-v4) and [Tenete 8M](https://huggingface.co/Harley-ml/Tenete-8M), but not myself!
|
| 22 |
|
| 23 |
# Model Configurations
|
| 24 |
+
| Parameter | Value |
|
| 25 |
+
|---|---|
|
| 26 |
+
| Tokenizer | Custom BPE tokenizer |
|
| 27 |
+
| Vocabulary Size | 4096 tokens |
|
| 28 |
+
| Batch Size | 64 |
|
| 29 |
+
| Context Window | 256 tokens |
|
| 30 |
+
| `n_embed` | 192 |
|
| 31 |
+
| `n_head` | 8 |
|
| 32 |
+
| `n_layer` | 6 |
|
| 33 |
+
| Dropout | 0.1 |
|
| 34 |
|
| 35 |
# Training Configurations
|
| 36 |
| Hyperparameter | Value |
|
|
|
|
| 45 |
|
| 46 |
# Limitations
|
| 47 |
* **Not Instruction-Tuned:** It's only a base model, so it only completes text.
|
| 48 |
+
* **English-Only:** It's trained on English data (FineWeb), it's NOT multilingual.
|
| 49 |
+
* **Not a Standard Model:** It's NOT a Qwen/Llama/GPT model. Standard Transformers can't recognize this!
|
| 50 |
+
|
| 51 |
+
# Some other details
|
| 52 |
+
* It's trained on 80 million tokens of [FineWeb](https://huggingface.co/datasets/HuggingFaceFW/fineweb) (CC-MAIN-2025-26 snapshot), and the knowledge cutoff is June 2025.
|
| 53 |
+
* The name "CinnabarLM" that I picked was made by combining "Cinnabar" (the new block from the Chaos Cubed drop in Minecraft) + "LM" (Language Model)
|