Update README.md
Browse files
README.md
CHANGED
|
@@ -12,4 +12,25 @@ tags:
|
|
| 12 |
- tiny-lm
|
| 13 |
- tinylm
|
| 14 |
- tinyllm
|
| 15 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
- tiny-lm
|
| 13 |
- tinylm
|
| 14 |
- tinyllm
|
| 15 |
+
---
|
| 16 |
+
|
| 17 |
+
# CinnabarLM
|
| 18 |
+
|
| 19 |
+
CinnabarLM is a tiny, 4M-parameter LLM trained for ~33 minutes on a T4 GPU (on Colab)!
|
| 20 |
+
|
| 21 |
+
# Model Configurations
|
| 22 |
+
|
| 23 |
+
# Training Configurations
|
| 24 |
+
| Hyperparameter | Value |
|
| 25 |
+
|---|---|
|
| 26 |
+
| `max_iters` | 10000 |
|
| 27 |
+
| `eval_interval` | 500 |
|
| 28 |
+
| `learning_rate` | 6e-4 |
|
| 29 |
+
| `min_lr` | 6e-5 |
|
| 30 |
+
| `warmup_iters` | 500 |
|
| 31 |
+
| `weight_decay` | 0.1 |
|
| 32 |
+
| `beta1, beta2` | 0.9, 0.95 |
|
| 33 |
+
|
| 34 |
+
# Limitations
|
| 35 |
+
* **Not Instruction-Tuned:** It's only a base model, so it only completes text.
|
| 36 |
+
* **English-Only:** It's trained on English data (FineWeb), it's NOT multilingual.
|