CinnabarLM 1.5M
What happens if you take the CinnabarLM idea and push it further? You'll get this!
CinnabarLM 1.5M is a tiny, 1.5M-parameter LLM trained for ~28 minutes on a T4 GPU (on Colab)! It's only 6 MB in size and now it's Llama-based!
Why?
Because it's a good idea to make tiny LLMs. Some people already did with MicroLM, Spark 4 5M and Tenete 8M, but not myself!
Model Configurations
| Parameter | Value |
|---|---|
| Tokenizer | Llama 3's tokenizer (Tiktoken / BPE) |
| Vocabulary Size | 4096 tokens |
| Batch Size | 4 x 8 = 32 |
| Context Window | Maybe 2048 tokens |
hidden_size |
128 |
intermediate_size |
256 |
num_hidden_layers |
4 |
num_attention_heads |
4 |
max_position_embeddings |
2048 |
rms_norm_eps |
1e-5 |
initializer_range |
0.02 |
use_cache |
True |
tie_word_embeddings |
False |
rope_theta |
10000.0 |
Training Configurations
| Hyperparameter | Value |
|---|---|
output_dir |
"./cinnabarlm-v2" |
max_steps |
10000 |
per_device_train_batch_size |
8 |
gradient_accumulation_steps |
4 |
learning_rate |
6e-4 |
weight_decay |
0.01 |
warmup_steps |
500 |
lr_scheduler_type |
"cosine" |
logging_steps |
100 |
save_steps |
2000 |
fp16 |
True |
save_total_limit |
2 |
prediction_loss_only |
True |
logging_first_step |
True |
Limitations
- Not Instruction-Tuned: It's only a base model, so it only completes text.
- English-Only: It's trained on English data (FineWeb), it's NOT multilingual.
Some other details
- It's trained on 50 million tokens of FineWeb (CC-MAIN-2025-26 snapshot), and the knowledge cutoff is June 2025.
- The name "CinnabarLM" that I picked was made by combining "Cinnabar" (the new block from the Chaos Cubed drop in Minecraft) + "LM" (Language Model)
Examples
- "Take some time to show you how this LLM generates text:" gives:
"Take some time to show you how this LLM generates text: love and fun. Your voice will be able to find out your style with a great fabric that will help your friend or family! I can't get more fun with us because we'll never give me the best idea why so I'm just happy to see all my favorite things at our time. Get a Curry Festival! Today, you've got the same gift of lucky cookies when it's been nice to take a look on. Please note: Google is not going to go to a lot of people looking for them. We're always happy to work"
- "Mihai Popa, which is 15 years old, which the person likes" gives:
"Mihai Popa, which is 15 years old, which the person likes a child is now being a bleeder who has been at the Time Summer. Presidents of Dragon's Day are the first time in the past six months to the school to the day of a hospital or office in San Francisville. The Best Times was also known as a family member. The school will be given for his life and family-friendly home, but it was a good place on a job to visit this site. They must have had a lot of moments from all around the world. However, there were some serious things about"
- "Even though Striders also require a stick item" gives:
"Even though Striders also require a stick item to create an outdoor space, but they are not just the perfect way to be in place. You can get a new look at the first time you’ll see how much the best thing we need is to make it easier for you! As always, our durable staff is incredibly easy and more than just one but for your newest team you will have to take some time in your day. The more you can give you everything in the best gift box? Let us know where you're going to share your style! From Cleaning Metaller to Western Distribution,"
- Downloads last month
- 101