CinnabarLM 1.5M

What happens if you take the CinnabarLM idea and push it further? You'll get this!

CinnabarLM 1.5M is a tiny, 1.5M-parameter LLM trained for ~28 minutes on a T4 GPU (on Colab)! It's only 6 MB in size and now it's Llama-based!

Why?

Because it's a good idea to make tiny LLMs. Some people already did with MicroLM, Spark 4 5M and Tenete 8M, but not myself!

Model Configurations

Parameter Value
Tokenizer Llama 3's tokenizer (Tiktoken / BPE)
Vocabulary Size 4096 tokens
Batch Size 4 x 8 = 32
Context Window Maybe 2048 tokens
hidden_size 128
intermediate_size 256
num_hidden_layers 4
num_attention_heads 4
max_position_embeddings 2048
rms_norm_eps 1e-5
initializer_range 0.02
use_cache True
tie_word_embeddings False
rope_theta 10000.0

Training Configurations

Hyperparameter Value
output_dir "./cinnabarlm-v2"
max_steps 10000
per_device_train_batch_size 8
gradient_accumulation_steps 4
learning_rate 6e-4
weight_decay 0.01
warmup_steps 500
lr_scheduler_type "cosine"
logging_steps 100
save_steps 2000
fp16 True
save_total_limit 2
prediction_loss_only True
logging_first_step True

Limitations

  • Not Instruction-Tuned: It's only a base model, so it only completes text.
  • English-Only: It's trained on English data (FineWeb), it's NOT multilingual.

Some other details

  • It's trained on 50 million tokens of FineWeb (CC-MAIN-2025-26 snapshot), and the knowledge cutoff is June 2025.
  • The name "CinnabarLM" that I picked was made by combining "Cinnabar" (the new block from the Chaos Cubed drop in Minecraft) + "LM" (Language Model)

Examples

  • "Take some time to show you how this LLM generates text:" gives:

"Take some time to show you how this LLM generates text: love and fun. Your voice will be able to find out your style with a great fabric that will help your friend or family! I can't get more fun with us because we'll never give me the best idea why so I'm just happy to see all my favorite things at our time. Get a Curry Festival! Today, you've got the same gift of lucky cookies when it's been nice to take a look on. Please note: Google is not going to go to a lot of people looking for them. We're always happy to work"

  • "Mihai Popa, which is 15 years old, which the person likes" gives:

"Mihai Popa, which is 15 years old, which the person likes a child is now being a bleeder who has been at the Time Summer. Presidents of Dragon's Day are the first time in the past six months to the school to the day of a hospital or office in San Francisville. The Best Times was also known as a family member. The school will be given for his life and family-friendly home, but it was a good place on a job to visit this site. They must have had a lot of moments from all around the world. However, there were some serious things about"

  • "Even though Striders also require a stick item" gives:

"Even though Striders also require a stick item to create an outdoor space, but they are not just the perfect way to be in place. You can get a new look at the first time you’ll see how much the best thing we need is to make it easier for you! As always, our durable staff is incredibly easy and more than just one but for your newest team you will have to take some time in your day. The more you can give you everything in the best gift box? Let us know where you're going to share your style! From Cleaning Metaller to Western Distribution,"

Downloads last month
101
Safetensors
Model size
1.71M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train MihaiPopa-1/CinnabarLM-1.5M-Base