--- license: apache-2.0 datasets: - HuggingFaceFW/fineweb language: - en pipeline_tag: text-generation tags: - tiny-model - cinnabarlm - tiny-llm - tiny-lm - tinylm - tinyllm new_version: MihaiPopa-1/CinnabarLM-4M-Base --- # CinnabarLM CinnabarLM is a tiny, 4M-parameter LLM trained for ~33 minutes on a T4 GPU (on Colab)! It's only 16 MB in size! # Why? Because it's a good idea to make tiny LLMs. Some people already did with [MicroLM](https://huggingface.co/CromIA/MicroLM-1M), [Spark 4 5M](https://huggingface.co/LH-Tech-AI/Spark-5M-Base-v4) and [Tenete 8M](https://huggingface.co/Harley-ml/Tenete-8M), but not myself! # Model Configurations | Parameter | Value | |---|---| | Tokenizer | Custom BPE tokenizer | | Vocabulary Size | 4096 tokens | | Batch Size | 64 | | Context Window | 256 tokens | | `n_embed` | 192 | | `n_head` | 8 | | `n_layer` | 6 | | Dropout | 0.1 | # Training Configurations | Hyperparameter | Value | |---|---| | `max_iters` | 10000 | | `eval_interval` | 500 | | `learning_rate` | 6e-4 | | `min_lr` | 6e-5 | | `warmup_iters` | 500 | | `weight_decay` | 0.1 | | `beta1, beta2` | 0.9, 0.95 | # Limitations * **Not Instruction-Tuned:** It's only a base model, so it only completes text. * **English-Only:** It's trained on English data (FineWeb), it's NOT multilingual. * **Not a Standard Model:** It's NOT a Qwen/Llama/GPT model. Standard Transformers can't recognize this! * **Preview:** This is a preview version, it generates gibberish often. CinnabarLM 1 will solve this with Llama. # Some other details * It's trained on 80 million tokens of [FineWeb](https://huggingface.co/datasets/HuggingFaceFW/fineweb) (CC-MAIN-2025-26 snapshot), and the knowledge cutoff is June 2025. * The name "CinnabarLM" that I picked was made by combining "Cinnabar" (the new block from the Chaos Cubed drop in Minecraft) + "LM" (Language Model)