Update README.md

80fe33b verified 19 days ago

1.86 kB

license: apache-2.0
datasets:
  - HuggingFaceFW/fineweb
language:
  - en
pipeline_tag: text-generation
tags:
  - tiny-model
  - cinnabarlm
  - tiny-llm
  - tiny-lm
  - tinylm
  - tinyllm
new_version: MihaiPopa-1/CinnabarLM-4M-Base

CinnabarLM

CinnabarLM is a tiny, 4M-parameter LLM trained for ~33 minutes on a T4 GPU (on Colab)! It's only 16 MB in size!

Why?

Because it's a good idea to make tiny LLMs. Some people already did with MicroLM, Spark 4 5M and Tenete 8M, but not myself!

Model Configurations

Parameter	Value
Tokenizer	Custom BPE tokenizer
Vocabulary Size	4096 tokens
Batch Size	64
Context Window	256 tokens
`n_embed`	192
`n_head`	8
`n_layer`	6
Dropout	0.1

Training Configurations

Hyperparameter	Value
`max_iters`	10000
`eval_interval`	500
`learning_rate`	6e-4
`min_lr`	6e-5
`warmup_iters`	500
`weight_decay`	0.1
`beta1, beta2`	0.9, 0.95

Limitations

Not Instruction-Tuned: It's only a base model, so it only completes text.
English-Only: It's trained on English data (FineWeb), it's NOT multilingual.
Not a Standard Model: It's NOT a Qwen/Llama/GPT model. Standard Transformers can't recognize this!
Preview: This is a preview version, it generates gibberish often. CinnabarLM 1 will solve this with Llama.

Some other details

It's trained on 80 million tokens of FineWeb (CC-MAIN-2025-26 snapshot), and the knowledge cutoff is June 2025.
The name "CinnabarLM" that I picked was made by combining "Cinnabar" (the new block from the Chaos Cubed drop in Minecraft) + "LM" (Language Model)