metadata
license: apache-2.0
datasets:
- HuggingFaceFW/fineweb
language:
- en
pipeline_tag: text-generation
tags:
- tiny-model
- cinnabarlm
- tiny-llm
- tiny-lm
- tinylm
- tinyllm
new_version: MihaiPopa-1/CinnabarLM-4M-Base
CinnabarLM
CinnabarLM is a tiny, 4M-parameter LLM trained for ~33 minutes on a T4 GPU (on Colab)! It's only 16 MB in size!
Why?
Because it's a good idea to make tiny LLMs. Some people already did with MicroLM, Spark 4 5M and Tenete 8M, but not myself!
Model Configurations
| Parameter | Value |
|---|---|
| Tokenizer | Custom BPE tokenizer |
| Vocabulary Size | 4096 tokens |
| Batch Size | 64 |
| Context Window | 256 tokens |
n_embed |
192 |
n_head |
8 |
n_layer |
6 |
| Dropout | 0.1 |
Training Configurations
| Hyperparameter | Value |
|---|---|
max_iters |
10000 |
eval_interval |
500 |
learning_rate |
6e-4 |
min_lr |
6e-5 |
warmup_iters |
500 |
weight_decay |
0.1 |
beta1, beta2 |
0.9, 0.95 |
Limitations
- Not Instruction-Tuned: It's only a base model, so it only completes text.
- English-Only: It's trained on English data (FineWeb), it's NOT multilingual.
- Not a Standard Model: It's NOT a Qwen/Llama/GPT model. Standard Transformers can't recognize this!
- Preview: This is a preview version, it generates gibberish often. CinnabarLM 1 will solve this with Llama.
Some other details
- It's trained on 80 million tokens of FineWeb (CC-MAIN-2025-26 snapshot), and the knowledge cutoff is June 2025.
- The name "CinnabarLM" that I picked was made by combining "Cinnabar" (the new block from the Chaos Cubed drop in Minecraft) + "LM" (Language Model)