---
license: mit
language:
- en
library_name: transformers
tags:
- text-generation
- tiny-lm
- tinystories
- educational
- built-with-llama
- small-model
pipeline_tag: text-generation
datasets:
- roneneldan/TinyStories
---

# TinyBuddy-500K

> ⚠️ **Educational / experimental model.** TinyBuddy-500K is a from-scratch tiny Llama-style language model (~547K parameters) trained on a synthetic slice of TinyStories-style text.
> It is **not** a useful assistant — it is a working demonstration of training extremely small models from scratch. See the [Limitations](#limitations) section.

## Model description

TinyBuddy-500K is a very small decoder-only Transformer language model trained on synthetic children's stories in the style of [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories). The architecture follows the LLaMA design (RMSNorm, Grouped Query Attention, SiLU MLP, tied embeddings).

| Hyperparameter          | Value                          |
|-------------------------|--------------------------------|
| Parameters              | **547,296** (~547K)            |
| Layers                  | 2                              |
| Attention heads         | 4                              |
| Key-Value heads (GQA)   | 2                              |
| Hidden size             | 96                             |
| MLP intermediate size   | 384                            |
| Context length          | 512                            |
| Vocab size              | 2,048 (BPE trained from scratch) |
| Norm                    | RMSNorm                        |
| Activation              | SiLU                           |
| Position embeddings     | Learned absolute               |
| Weight tying            | Yes (tied embeddings)          |
| Precision               | float32                        |

## Training details

- **Data**: Synthetic TinyStories-style corpus (~128K tokens)
- **Tokenizer**: Custom byte-level BPE with 2048 vocabulary
- **Optimizer**: AdamW
- **Steps**: ~300 steps on CPU
- **Hardware**: Single CPU core
- **Final loss**: ~0.17

## Usage

This model uses **custom modeling code**, so you must pass `trust_remote_code=True`.

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

repo = "Eeppa/TinyBuddy-500K"

tokenizer = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(repo, trust_remote_code=True)
model.eval()

prompt = "Once upon a time, there was a little girl named Lily."
input_ids = tokenizer.encode(prompt, return_tensors="pt")

out = model.generate(input_ids, max_new_tokens=60, temperature=0.8, top_k=50)
print(tokenizer.decode(out[0], skip_special_tokens=True))
```

## Limitations

This model is extremely small and was trained for a very short time on limited data.

**What works**:
- Basic English patterns and short sentence structure
- Simple story-like generation

**What's broken**:
- Very limited coherence (usually breaks after 1–2 sentences)
- High repetition
- Poor long-range consistency
- No real reasoning or factual knowledge

This model exists purely for educational purposes to explore the lower limits of language model size.

## License

MIT

## Citation

```bibtex
@misc{tinybuddy500k,
  title  = {TinyBuddy-500K: An educational ~500K parameter Llama-style model trained on TinyStories},
  year   = {2026},
  note   = {Educational demonstration of extremely small language models.}
}
```

**Built with Llama.**