Create README.md

5b7279b verified 5 days ago

2.25 kB

license: apache-2.0
datasets:
  - HuggingFaceFW/fineweb
language:
  - en
pipeline_tag: text-generation
tags:
  - tiny-model
  - cinnabarlm
  - tiny-llm
  - tiny-lm
  - tinylm
  - tinyllm

CinnabarLM 1.4M

What happens if you take the CinnabarLM idea and push it a little more further? You'll get this!

CinnabarLM 1.4M is a tiny, 1.4M-parameter LLM trained for ~26.75 minutes on a T4 GPU (on Colab)! It's only 6 MB in size and now it's Llama-based!

Why?

Because it's a good idea to make tiny LLMs. Some people already did with MicroLM, Spark 4 5M and Tenete 8M, but not myself!

Model Configurations

Parameter	Value
Tokenizer	Llama 3's tokenizer (Tiktoken / BPE)
Vocabulary Size	4096 tokens
Batch Size	4 x 8 = 32
Context Window	Maybe 2048 tokens
`hidden_size`	128
`intermediate_size`	128
`num_hidden_layers`	4
`num_attention_heads`	4
`max_position_embeddings`	2048
`rms_norm_eps`	`1e-5`
`initializer_range`	0.02
`use_cache`	True
`tie_word_embeddings`	False
`rope_theta`	10000.0

Training Configurations

Hyperparameter	Value
`output_dir`	"./cinnabarlm-v2"
`max_steps`	10000
`per_device_train_batch_size`	8
`gradient_accumulation_steps`	4
`learning_rate`	6e-4
`weight_decay`	0.01
`warmup_steps`	500
`lr_scheduler_type`	"cosine"
`logging_steps`	100
`save_steps`	2000
`fp16`	True
`save_total_limit`	2
`prediction_loss_only`	True
`logging_first_step`	True

Limitations

Not Instruction-Tuned: It's only a base model, so it only completes text.
English-Only: It's trained on English data (FineWeb), it's NOT multilingual.

Some other details

It's trained on 50 million tokens of FineWeb (CC-MAIN-2025-26 snapshot), and the knowledge cutoff is June 2025.
The name "CinnabarLM" that I picked was made by combining "Cinnabar" (the new block from the Chaos Cubed drop in Minecraft) + "LM" (Language Model)