Create README.md

5b7279b verified 5 days ago

2.25 kB

	---
	license: apache-2.0
	datasets:
	- HuggingFaceFW/fineweb
	language:
	- en
	pipeline_tag: text-generation
	tags:
	- tiny-model
	- cinnabarlm
	- tiny-llm
	- tiny-lm
	- tinylm
	- tinyllm
	---

	# CinnabarLM 1.4M
	What happens if you take the CinnabarLM idea and push it a little more further? You'll get this!

	CinnabarLM 1.4M is a tiny, 1.4M-parameter LLM trained for ~26.75 minutes on a T4 GPU (on Colab)! It's only 6 MB in size and now it's Llama-based!

	# Why?
	Because it's a good idea to make tiny LLMs. Some people already did with [MicroLM](https://huggingface.co/CromIA/MicroLM-1M), [Spark 4 5M](https://huggingface.co/LH-Tech-AI/Spark-5M-Base-v4) and [Tenete 8M](https://huggingface.co/Harley-ml/Tenete-8M), but not myself!

	# Model Configurations
	\| Parameter \| Value \|
	\|---\|---\|
	\| Tokenizer \| Llama 3's tokenizer (Tiktoken / BPE) \|
	\| Vocabulary Size \| 4096 tokens \|
	\| Batch Size \| 4 x 8 = 32 \|
	\| Context Window \| Maybe 2048 tokens \|
	\| `hidden_size` \| 128 \|
	\| `intermediate_size` \| 128 \|
	\| `num_hidden_layers` \| 4 \|
	\| `num_attention_heads` \| 4 \|
	\| `max_position_embeddings` \| 2048 \|
	\| `rms_norm_eps` \| `1e-5` \|
	\| `initializer_range` \| 0.02 \|
	\| `use_cache` \| True
	\| `tie_word_embeddings` \| False
	\| `rope_theta` \| 10000.0

	# Training Configurations
	\| Hyperparameter \| Value \|
	\|---\|---\|
	\| `output_dir` \| "./cinnabarlm-v2" \|
	\| `max_steps` \| 10000 \|
	\| `per_device_train_batch_size` \| 8 \|
	\| `gradient_accumulation_steps` \| 4 \|
	\| `learning_rate` \| 6e-4 \|
	\| `weight_decay` \| 0.01 \|
	\| `warmup_steps` \| 500 \|
	\| `lr_scheduler_type` \| "cosine" \|
	\| `logging_steps` \| 100 \|
	\| `save_steps` \| 2000 \|
	\| `fp16` \| True \|
	\| `save_total_limit` \| 2 \|
	\| `prediction_loss_only` \| True \|
	\| `logging_first_step` \| True \|

	# Limitations
	* Not Instruction-Tuned: It's only a base model, so it only completes text.
	* English-Only: It's trained on English data (FineWeb), it's NOT multilingual.
	# Some other details
	* It's trained on 50 million tokens of [FineWeb](https://huggingface.co/datasets/HuggingFaceFW/fineweb) (CC-MAIN-2025-26 snapshot), and the knowledge cutoff is June 2025.
	* The name "CinnabarLM" that I picked was made by combining "Cinnabar" (the new block from the Chaos Cubed drop in Minecraft) + "LM" (Language Model)

	---
	license: apache-2.0
	datasets:
	- HuggingFaceFW/fineweb
	language:
	- en
	pipeline_tag: text-generation
	tags:
	- tiny-model
	- cinnabarlm
	- tiny-llm
	- tiny-lm
	- tinylm
	- tinyllm
	---

	# CinnabarLM 1.4M
	What happens if you take the CinnabarLM idea and push it a little more further? You'll get this!

	CinnabarLM 1.4M is a tiny, 1.4M-parameter LLM trained for ~26.75 minutes on a T4 GPU (on Colab)! It's only 6 MB in size and now it's Llama-based!

	# Why?
	Because it's a good idea to make tiny LLMs. Some people already did with [MicroLM](https://huggingface.co/CromIA/MicroLM-1M), [Spark 4 5M](https://huggingface.co/LH-Tech-AI/Spark-5M-Base-v4) and [Tenete 8M](https://huggingface.co/Harley-ml/Tenete-8M), but not myself!

	# Model Configurations
	\| Parameter \| Value \|
	\|---\|---\|
	\| Tokenizer \| Llama 3's tokenizer (Tiktoken / BPE) \|
	\| Vocabulary Size \| 4096 tokens \|
	\| Batch Size \| 4 x 8 = 32 \|
	\| Context Window \| Maybe 2048 tokens \|
	\| `hidden_size` \| 128 \|
	\| `intermediate_size` \| 128 \|
	\| `num_hidden_layers` \| 4 \|
	\| `num_attention_heads` \| 4 \|
	\| `max_position_embeddings` \| 2048 \|
	\| `rms_norm_eps` \| `1e-5` \|
	\| `initializer_range` \| 0.02 \|
	\| `use_cache` \| True
	\| `tie_word_embeddings` \| False
	\| `rope_theta` \| 10000.0

	# Training Configurations
	\| Hyperparameter \| Value \|
	\|---\|---\|
	\| `output_dir` \| "./cinnabarlm-v2" \|
	\| `max_steps` \| 10000 \|
	\| `per_device_train_batch_size` \| 8 \|
	\| `gradient_accumulation_steps` \| 4 \|
	\| `learning_rate` \| 6e-4 \|
	\| `weight_decay` \| 0.01 \|
	\| `warmup_steps` \| 500 \|
	\| `lr_scheduler_type` \| "cosine" \|
	\| `logging_steps` \| 100 \|
	\| `save_steps` \| 2000 \|
	\| `fp16` \| True \|
	\| `save_total_limit` \| 2 \|
	\| `prediction_loss_only` \| True \|
	\| `logging_first_step` \| True \|

	# Limitations
	* Not Instruction-Tuned: It's only a base model, so it only completes text.
	* English-Only: It's trained on English data (FineWeb), it's NOT multilingual.
	# Some other details
	* It's trained on 50 million tokens of [FineWeb](https://huggingface.co/datasets/HuggingFaceFW/fineweb) (CC-MAIN-2025-26 snapshot), and the knowledge cutoff is June 2025.
	* The name "CinnabarLM" that I picked was made by combining "Cinnabar" (the new block from the Chaos Cubed drop in Minecraft) + "LM" (Language Model)