| --- |
| license: apache-2.0 |
| datasets: |
| - HuggingFaceFW/fineweb |
| language: |
| - en |
| pipeline_tag: text-generation |
| tags: |
| - tiny-model |
| - cinnabarlm |
| - tiny-llm |
| - tiny-lm |
| - tinylm |
| - tinyllm |
| new_version: MihaiPopa-1/CinnabarLM-4M-Base |
| --- |
| |
| # CinnabarLM |
| CinnabarLM is a tiny, 4M-parameter LLM trained for ~33 minutes on a T4 GPU (on Colab)! It's only 16 MB in size! |
|
|
| # Why? |
| Because it's a good idea to make tiny LLMs. Some people already did with [MicroLM](https://huggingface.co/CromIA/MicroLM-1M), [Spark 4 5M](https://huggingface.co/LH-Tech-AI/Spark-5M-Base-v4) and [Tenete 8M](https://huggingface.co/Harley-ml/Tenete-8M), but not myself! |
|
|
| # Model Configurations |
| | Parameter | Value | |
| |---|---| |
| | Tokenizer | Custom BPE tokenizer | |
| | Vocabulary Size | 4096 tokens | |
| | Batch Size | 64 | |
| | Context Window | 256 tokens | |
| | `n_embed` | 192 | |
| | `n_head` | 8 | |
| | `n_layer` | 6 | |
| | Dropout | 0.1 | |
|
|
| # Training Configurations |
| | Hyperparameter | Value | |
| |---|---| |
| | `max_iters` | 10000 | |
| | `eval_interval` | 500 | |
| | `learning_rate` | 6e-4 | |
| | `min_lr` | 6e-5 | |
| | `warmup_iters` | 500 | |
| | `weight_decay` | 0.1 | |
| | `beta1, beta2` | 0.9, 0.95 | |
|
|
| # Limitations |
| * **Not Instruction-Tuned:** It's only a base model, so it only completes text. |
| * **English-Only:** It's trained on English data (FineWeb), it's NOT multilingual. |
| * **Not a Standard Model:** It's NOT a Qwen/Llama/GPT model. Standard Transformers can't recognize this! |
| * **Preview:** This is a preview version, it generates gibberish often. CinnabarLM 1 will solve this with Llama. |
|
|
| # Some other details |
| * It's trained on 80 million tokens of [FineWeb](https://huggingface.co/datasets/HuggingFaceFW/fineweb) (CC-MAIN-2025-26 snapshot), and the knowledge cutoff is June 2025. |
| * The name "CinnabarLM" that I picked was made by combining "Cinnabar" (the new block from the Chaos Cubed drop in Minecraft) + "LM" (Language Model) |