MihaiPopa-1's picture
Update README.md
80fe33b verified
---
license: apache-2.0
datasets:
- HuggingFaceFW/fineweb
language:
- en
pipeline_tag: text-generation
tags:
- tiny-model
- cinnabarlm
- tiny-llm
- tiny-lm
- tinylm
- tinyllm
new_version: MihaiPopa-1/CinnabarLM-4M-Base
---
# CinnabarLM
CinnabarLM is a tiny, 4M-parameter LLM trained for ~33 minutes on a T4 GPU (on Colab)! It's only 16 MB in size!
# Why?
Because it's a good idea to make tiny LLMs. Some people already did with [MicroLM](https://huggingface.co/CromIA/MicroLM-1M), [Spark 4 5M](https://huggingface.co/LH-Tech-AI/Spark-5M-Base-v4) and [Tenete 8M](https://huggingface.co/Harley-ml/Tenete-8M), but not myself!
# Model Configurations
| Parameter | Value |
|---|---|
| Tokenizer | Custom BPE tokenizer |
| Vocabulary Size | 4096 tokens |
| Batch Size | 64 |
| Context Window | 256 tokens |
| `n_embed` | 192 |
| `n_head` | 8 |
| `n_layer` | 6 |
| Dropout | 0.1 |
# Training Configurations
| Hyperparameter | Value |
|---|---|
| `max_iters` | 10000 |
| `eval_interval` | 500 |
| `learning_rate` | 6e-4 |
| `min_lr` | 6e-5 |
| `warmup_iters` | 500 |
| `weight_decay` | 0.1 |
| `beta1, beta2` | 0.9, 0.95 |
# Limitations
* **Not Instruction-Tuned:** It's only a base model, so it only completes text.
* **English-Only:** It's trained on English data (FineWeb), it's NOT multilingual.
* **Not a Standard Model:** It's NOT a Qwen/Llama/GPT model. Standard Transformers can't recognize this!
* **Preview:** This is a preview version, it generates gibberish often. CinnabarLM 1 will solve this with Llama.
# Some other details
* It's trained on 80 million tokens of [FineWeb](https://huggingface.co/datasets/HuggingFaceFW/fineweb) (CC-MAIN-2025-26 snapshot), and the knowledge cutoff is June 2025.
* The name "CinnabarLM" that I picked was made by combining "Cinnabar" (the new block from the Chaos Cubed drop in Minecraft) + "LM" (Language Model)