MihaiPopa-1's picture
Create README.md
5b7279b verified
---
license: apache-2.0
datasets:
- HuggingFaceFW/fineweb
language:
- en
pipeline_tag: text-generation
tags:
- tiny-model
- cinnabarlm
- tiny-llm
- tiny-lm
- tinylm
- tinyllm
---
# CinnabarLM 1.4M
What happens if you take the CinnabarLM idea and push it a little more further? You'll get this!
CinnabarLM 1.4M is a tiny, 1.4M-parameter LLM trained for ~26.75 minutes on a T4 GPU (on Colab)! It's only 6 MB in size and now it's Llama-based!
# Why?
Because it's a good idea to make tiny LLMs. Some people already did with [MicroLM](https://huggingface.co/CromIA/MicroLM-1M), [Spark 4 5M](https://huggingface.co/LH-Tech-AI/Spark-5M-Base-v4) and [Tenete 8M](https://huggingface.co/Harley-ml/Tenete-8M), but not myself!
# Model Configurations
| Parameter | Value |
|---|---|
| Tokenizer | Llama 3's tokenizer (Tiktoken / BPE) |
| Vocabulary Size | 4096 tokens |
| Batch Size | 4 x 8 = 32 |
| Context Window | Maybe 2048 tokens |
| `hidden_size` | 128 |
| `intermediate_size` | 128 |
| `num_hidden_layers` | 4 |
| `num_attention_heads` | 4 |
| `max_position_embeddings` | 2048 |
| `rms_norm_eps` | `1e-5` |
| `initializer_range` | 0.02 |
| `use_cache` | True
| `tie_word_embeddings` | False
| `rope_theta` | 10000.0
# Training Configurations
| Hyperparameter | Value |
|---|---|
| `output_dir` | "./cinnabarlm-v2" |
| `max_steps` | 10000 |
| `per_device_train_batch_size` | 8 |
| `gradient_accumulation_steps` | 4 |
| `learning_rate` | 6e-4 |
| `weight_decay` | 0.01 |
| `warmup_steps` | 500 |
| `lr_scheduler_type` | "cosine" |
| `logging_steps` | 100 |
| `save_steps` | 2000 |
| `fp16` | True |
| `save_total_limit` | 2 |
| `prediction_loss_only` | True |
| `logging_first_step` | True |
# Limitations
* **Not Instruction-Tuned:** It's only a base model, so it only completes text.
* **English-Only:** It's trained on English data (FineWeb), it's NOT multilingual.
# Some other details
* It's trained on 50 million tokens of [FineWeb](https://huggingface.co/datasets/HuggingFaceFW/fineweb) (CC-MAIN-2025-26 snapshot), and the knowledge cutoff is June 2025.
* The name "CinnabarLM" that I picked was made by combining "Cinnabar" (the new block from the Chaos Cubed drop in Minecraft) + "LM" (Language Model)