--- license: mit language: - en library_name: transformers tags: - text-generation - tiny-lm - tinystories - educational - built-with-llama - small-model pipeline_tag: text-generation datasets: - roneneldan/TinyStories --- # TinyBuddy-500K > ⚠️ **Educational / experimental model.** TinyBuddy-500K is a from-scratch tiny Llama-style language model (~547K parameters) trained on a synthetic slice of TinyStories-style text. > It is **not** a useful assistant — it is a working demonstration of training extremely small models from scratch. See the [Limitations](#limitations) section. ## Model description TinyBuddy-500K is a very small decoder-only Transformer language model trained on synthetic children's stories in the style of [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories). The architecture follows the LLaMA design (RMSNorm, Grouped Query Attention, SiLU MLP, tied embeddings). | Hyperparameter | Value | |-------------------------|--------------------------------| | Parameters | **547,296** (~547K) | | Layers | 2 | | Attention heads | 4 | | Key-Value heads (GQA) | 2 | | Hidden size | 96 | | MLP intermediate size | 384 | | Context length | 512 | | Vocab size | 2,048 (BPE trained from scratch) | | Norm | RMSNorm | | Activation | SiLU | | Position embeddings | Learned absolute | | Weight tying | Yes (tied embeddings) | | Precision | float32 | ## Training details - **Data**: Synthetic TinyStories-style corpus (~128K tokens) - **Tokenizer**: Custom byte-level BPE with 2048 vocabulary - **Optimizer**: AdamW - **Steps**: ~300 steps on CPU - **Hardware**: Single CPU core - **Final loss**: ~0.17 ## Usage This model uses **custom modeling code**, so you must pass `trust_remote_code=True`. ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch repo = "Eeppa/TinyBuddy-500K" tokenizer = AutoTokenizer.from_pretrained(repo, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained(repo, trust_remote_code=True) model.eval() prompt = "Once upon a time, there was a little girl named Lily." input_ids = tokenizer.encode(prompt, return_tensors="pt") out = model.generate(input_ids, max_new_tokens=60, temperature=0.8, top_k=50) print(tokenizer.decode(out[0], skip_special_tokens=True)) ``` ## Limitations This model is extremely small and was trained for a very short time on limited data. **What works**: - Basic English patterns and short sentence structure - Simple story-like generation **What's broken**: - Very limited coherence (usually breaks after 1–2 sentences) - High repetition - Poor long-range consistency - No real reasoning or factual knowledge This model exists purely for educational purposes to explore the lower limits of language model size. ## License MIT ## Citation ```bibtex @misc{tinybuddy500k, title = {TinyBuddy-500K: An educational ~500K parameter Llama-style model trained on TinyStories}, year = {2026}, note = {Educational demonstration of extremely small language models.} } ``` **Built with Llama.**