Text Generation
Transformers
Safetensors
English
tinybuddy
tiny-lm
tinystories
educational
built-with-llama
custom_code
Instructions to use Eeppa/TinyBuddy-30M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Eeppa/TinyBuddy-30M with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Eeppa/TinyBuddy-30M", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("Eeppa/TinyBuddy-30M", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Eeppa/TinyBuddy-30M with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Eeppa/TinyBuddy-30M" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Eeppa/TinyBuddy-30M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Eeppa/TinyBuddy-30M
- SGLang
How to use Eeppa/TinyBuddy-30M with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Eeppa/TinyBuddy-30M" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Eeppa/TinyBuddy-30M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Eeppa/TinyBuddy-30M" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Eeppa/TinyBuddy-30M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Eeppa/TinyBuddy-30M with Docker Model Runner:
docker model run hf.co/Eeppa/TinyBuddy-30M
| license: mit | |
| language: | |
| - en | |
| library_name: transformers | |
| tags: | |
| - text-generation | |
| - tiny-lm | |
| - tinystories | |
| - educational | |
| - built-with-llama | |
| pipeline_tag: text-generation | |
| datasets: | |
| - roneneldan/TinyStories | |
| # TinyBuddy-30M | |
| > ⚠️ **Educational / demo model.** TinyBuddy-30M is a from-scratch tiny GPT-style | |
| > language model (~30M parameters) trained for ~12 minutes on a 2-core CPU. | |
| > It is **not** a useful assistant — it is a working end-to-end demonstration | |
| > of the LM training pipeline. See the [Limitations](#limitations) section. | |
| ## Model description | |
| TinyBuddy-30M is a small decoder-only Transformer language model trained on a | |
| slice of the [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories) | |
| dataset. The architecture is a standard pre-norm GPT-style stack | |
| (LayerNorm + Causal Multi-Head Self-Attention + GELU MLP) inspired by the | |
| LLaMA / GPT family of decoder-only models. | |
| | Hyperparameter | Value | | |
| | --- | --- | | |
| | Parameters | **30,371,840** (~30.37M) | | |
| | Layers | 6 | | |
| | Attention heads | 8 | | |
| | Embedding dim | 256 | | |
| | MLP hidden dim | 1024 (mlp_ratio = 4) | | |
| | Context length (`block_size`) | 512 | | |
| | Vocab size | 50,000 (BPE; ~18k actually used) | | |
| | Activation | GELU | | |
| | Norm | LayerNorm (pre-norm) | | |
| | Attention | Causal SDPA | | |
| | Position embeddings | Learned absolute | | |
| | Weight tying | No (separate LM head) | | |
| | Precision | float32 | | |
| Most of the parameter budget lives in the token embedding + LM head | |
| (~25.6M of 30M). This is typical for small LMs. | |
| ## Training details | |
| - **Data**: ~22 MB slice of TinyStories (`TinyStoriesV2-GPT4-valid.txt`, | |
| 27,630 short children's stories, ~5.3M BPE tokens after tokenization). | |
| - **Tokenizer**: byte-level BPE trained from scratch on the same slice | |
| (saturated at ~18k merges; embedding padded to 50k to hit the 30M target). | |
| - **Optimizer**: AdamW, β=(0.9, 0.95), weight_decay=0.1, grad clip 1.0. | |
| - **Schedule**: cosine decay from 5e-4 → 5e-5 with 100-step linear warmup. | |
| - **Batch**: `batch_size=4`, `block_size=128` (≈ 512 tokens / step). | |
| - **Steps**: **1,500** (≈ 0.77M tokens seen — roughly **0.2% of one epoch** | |
| of full TinyStories). | |
| - **Hardware**: 2 CPU cores, ~2 GB RAM, ~**12 minutes** wall time | |
| (≈16 min including evals). | |
| - **Final loss**: **train ≈ 3.53 / val ≈ 3.43** (~3.55 averaged). | |
| Perplexity ≈ 30 — well above the ≈ 4–5 a properly-trained TinyStories | |
| model of this size reaches. | |
| Loss curve (training log): | |
| ``` | |
| step 0 | train 10.88 | val 10.88 | |
| step 150 | train 4.83 | val 4.68 | |
| step 300 | train 4.32 | val 4.28 | |
| step 600 | train 3.85 | val 3.90 | |
| step 900 | train 3.71 | val 3.77 | |
| step 1200 | train 3.57 | val 3.55 | |
| step 1500 | train 3.53 | val 3.43 | |
| ``` | |
| ## Usage | |
| This model uses **custom modeling code**, so you must pass | |
| `trust_remote_code=True` when loading it. | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| import torch | |
| repo = "YOUR_USERNAME/TinyBuddy-30M" # or local path to this folder | |
| tokenizer = AutoTokenizer.from_pretrained(repo) | |
| model = AutoModelForCausalLM.from_pretrained(repo, trust_remote_code=True) | |
| model.eval() | |
| prompt = "Once upon a time, there was a little girl named Lily." | |
| input_ids = torch.tensor([tokenizer.encode(prompt).ids | |
| if hasattr(tokenizer.encode(prompt), "ids") | |
| else tokenizer.encode(prompt)]) | |
| # TinyBuddy ships a custom `.generate(...)` (top-k sampling). Use it directly: | |
| out = model.generate(input_ids, max_new_tokens=120, temperature=0.8, top_k=50) | |
| print(tokenizer.decode(out[0].tolist())) | |
| ``` | |
| If you prefer to bypass `transformers` entirely, you can use the raw | |
| `tokenizers` library + the included modeling file: | |
| ```python | |
| from tokenizers import Tokenizer | |
| from safetensors.torch import load_file | |
| from modeling_tinybuddy import TinyGPT, GPTConfig | |
| import json, torch | |
| cfg = GPTConfig(**{k: v for k, v in json.load(open("config.json")).items() | |
| if k in GPTConfig.__dataclass_fields__}) | |
| model = TinyGPT(cfg) | |
| model.load_state_dict(load_file("model.safetensors")) | |
| model.eval() | |
| tok = Tokenizer.from_file("tokenizer.json") | |
| ids = tok.encode("Once upon a time").ids | |
| out = model.generate(torch.tensor([ids]), max_new_tokens=80, temperature=0.8, top_k=50) | |
| print(tok.decode(out[0].tolist())) | |
| ``` | |
| ## Example outputs | |
| **Prompt:** *"Once upon a time, there was a little girl named Lily."* | |
| > Once upon a time, there was a little girl named Lily. They loved to play | |
| > with their parents. One day, Tom went to the park. The sun loved the box | |
| > and had many friends. One day, they went for a small tree, a lot of friends. | |
| > He said, "What is better. But you want to find your friends, Bob?" … | |
| **Prompt:** *"Tom and Sam were playing in the park when"* | |
| > Tom and Sam were playing in the park when they were very much. Once upon a | |
| > time, there was a girl named The cat with her mom. They had a little girl | |
| > named Mia. She loved to play with her friends and play with her mom. … | |
| ## Limitations | |
| **Be honest with yourself: this model is bad, and that is expected.** | |
| What works ✅ | |
| - Vocabulary & register match TinyStories (short sentences, character names | |
| like Tim/Lily/Spot, motifs like "Once upon a time", "the park"). | |
| - Local grammar is mostly intact (subject–verb–object, quoted dialogue, | |
| punctuation). | |
| - Document boundaries (`<|endoftext|>`) are respected. | |
| What's broken ❌ | |
| - **No narrative coherence** across more than one or two sentences. | |
| - **Character drift** — characters appear, vanish, or swap names mid-story. | |
| - **Pronoun confusion** ("They" referring to a single girl). | |
| - **Ungrammatical fragments** ("She found a very happy."). | |
| - **Repetition loops** ("play with X. play with Y. play with Z."). | |
| - **No factual knowledge, no reasoning, no instruction following.** | |
| ### Why | |
| | Factor | This model | A good TinyStories-class model | | |
| | --- | --- | --- | | |
| | Tokens seen | ~0.77 M | ~10⁹+ | | |
| | Hardware | 2 CPU cores | 1+ GPUs | | |
| | Wall time | ~12 min | many hours | | |
| | Final loss | ~3.5 | ~1.3–1.6 | | |
| | Perplexity | ~30 | ~4–5 | | |
| This is roughly **3–4 orders of magnitude less compute** than a serious | |
| TinyStories training run. The architecture and pipeline are correct; only | |
| the optimization budget is tiny. | |
| ### Intended use | |
| - ✅ Educational reference for building / training / packaging a small LM. | |
| - ✅ Sanity-checking a training pipeline. | |
| - ✅ Demonstrating safetensors + Hugging Face Hub packaging. | |
| - ❌ **Not** for any production, user-facing, or assistive use case. | |
| - ❌ **Not** a source of factual information. | |
| - ❌ **Not** safe for inputs from untrusted users (no safety training). | |
| ## Bias, risks, and safety | |
| The training data is TinyStories — synthetic children's stories generated | |
| by GPT-3.5/4. The model has not undergone any safety, RLHF, or | |
| instruction-tuning step. It may produce nonsensical, biased, or repetitive | |
| output, and should not be deployed in any setting where output quality or | |
| safety matters. | |
| ## License | |
| MIT. | |
| ## Citation | |
| If you use this code or model in teaching materials, please cite as: | |
| ``` | |
| @misc{tinybuddy30m, | |
| title = {TinyBuddy-30M: a from-scratch ~30M-parameter transformer trained on TinyStories}, | |
| year = {2026}, | |
| note = {Educational demonstration model.} | |
| } | |
| ``` | |
| And please cite TinyStories: | |
| ``` | |
| @article{eldan2023tinystories, | |
| title = {TinyStories: How Small Can Language Models Be and Still Speak Coherent English?}, | |
| author = {Eldan, Ronen and Li, Yuanzhi}, | |
| journal = {arXiv preprint arXiv:2305.07759}, | |
| year = {2023} | |
| } | |
| ``` | |
| ## Built with Llama | |
| This model's architecture is inspired by the LLaMA family of decoder-only | |
| transformer language models (pre-norm, causal multi-head self-attention, | |
| GELU MLP). The implementation is from-scratch PyTorch and does not include | |
| any LLaMA weights, but follows the same overall design pattern. | |
| **Built with Llama.** | |