Eeppa
/

TinyBuddy-500K

Text Generation

built-with-llama

Model card Files Files and versions

TinyBuddy-500K / README.md

Eeppa's picture

Upload 12 files

de58358 verified 3 days ago

|

history blame contribute delete

3.49 kB

	---
	license: mit
	language:
	- en
	library_name: transformers
	tags:
	- text-generation
	- tiny-lm
	- tinystories
	- educational
	- built-with-llama
	- small-model
	pipeline_tag: text-generation
	datasets:
	- roneneldan/TinyStories
	---

	# TinyBuddy-500K

	> ⚠️ Educational / experimental model. TinyBuddy-500K is a from-scratch tiny Llama-style language model (~547K parameters) trained on a synthetic slice of TinyStories-style text.
	> It is not a useful assistant — it is a working demonstration of training extremely small models from scratch. See the [Limitations](#limitations) section.

	## Model description

	TinyBuddy-500K is a very small decoder-only Transformer language model trained on synthetic children's stories in the style of [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories). The architecture follows the LLaMA design (RMSNorm, Grouped Query Attention, SiLU MLP, tied embeddings).

	\| Hyperparameter \| Value \|
	\|-------------------------\|--------------------------------\|
	\| Parameters \| 547,296 (~547K) \|
	\| Layers \| 2 \|
	\| Attention heads \| 4 \|
	\| Key-Value heads (GQA) \| 2 \|
	\| Hidden size \| 96 \|
	\| MLP intermediate size \| 384 \|
	\| Context length \| 512 \|
	\| Vocab size \| 2,048 (BPE trained from scratch) \|
	\| Norm \| RMSNorm \|
	\| Activation \| SiLU \|
	\| Position embeddings \| Learned absolute \|
	\| Weight tying \| Yes (tied embeddings) \|
	\| Precision \| float32 \|

	## Training details

	- Data: Synthetic TinyStories-style corpus (~128K tokens)
	- Tokenizer: Custom byte-level BPE with 2048 vocabulary
	- Optimizer: AdamW
	- Steps: ~300 steps on CPU
	- Hardware: Single CPU core
	- Final loss: ~0.17

	## Usage

	This model uses custom modeling code, so you must pass `trust_remote_code=True`.

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	repo = "Eeppa/TinyBuddy-500K"

	tokenizer = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained(repo, trust_remote_code=True)
	model.eval()

	prompt = "Once upon a time, there was a little girl named Lily."
	input_ids = tokenizer.encode(prompt, return_tensors="pt")

	out = model.generate(input_ids, max_new_tokens=60, temperature=0.8, top_k=50)
	print(tokenizer.decode(out[0], skip_special_tokens=True))
	```

	## Limitations

	This model is extremely small and was trained for a very short time on limited data.

	What works:
	- Basic English patterns and short sentence structure
	- Simple story-like generation

	What's broken:
	- Very limited coherence (usually breaks after 1–2 sentences)
	- High repetition
	- Poor long-range consistency
	- No real reasoning or factual knowledge

	This model exists purely for educational purposes to explore the lower limits of language model size.

	## License

	MIT

	## Citation

	```bibtex
	@misc{tinybuddy500k,
	title = {TinyBuddy-500K: An educational ~500K parameter Llama-style model trained on TinyStories},
	year = {2026},
	note = {Educational demonstration of extremely small language models.}
	}
	```

	Built with Llama.