nanoGPT SLM -- 124.0M Parameter Children's Story Generator
A small language model trained entirely from scratch using a custom nanoGPT (GPT-2 small) implementation. Pretrained on the TinyStories dataset to generate short, coherent stories for young children (ages 3-5).
What This Model Does
This model generates short children's stories suitable for 3-5 year olds. Give it the beginning of a story and it will continue writing in simple, age-appropriate language:
Input: "Once upon a time there was a little rabbit"
Output: "Once upon a time there was a little rabbit who lived in a big forest.
The rabbit loved to hop and play with his friends. One day, he
found a shiny red ball near the river..."
Capabilities:
- Generates coherent short stories (100-500 words) using the pretrained model from scratch
- Uses simple vocabulary appropriate for young children
- Follows common story patterns (characters, conflict, resolution)
- Understands basic narrative structure (beginning, middle, end)
- Stops cleanly at story boundaries via learned
<|endoftext|>token - Can continue from any story opening/prompt
- As an add on, the Chat UI also gives users to generate an image of the story generated using
stable-diffusion-xl-base-1.0model
Limitations:
- Stories are short (max 512 token context window)
- Limited to simple vocabulary and narrative structures
- No instruction-following ability (see fine-tuned variants below)
- May occasionally generate repetitive or nonsensical text
- English only
Training Dataset: TinyStories
| Attribute | Value |
|---|---|
| Dataset | TinyStories (Eldan & Li, 2023) |
| Description | Synthetic short stories generated by GPT-3.5/GPT-4, filtered for quality |
| Target audience | Children aged 3-5 years |
| Vocabulary | Words that a typical 3-4 year old would understand |
| Training stories | ~2,119,719 |
| Validation stories | ~21,990 |
| Total tokens | ~470M |
| Average story length | ~220 tokens |
| Topics | Animals, friendship, family, nature, adventure, sharing, kindness |
The TinyStories dataset was specifically designed to study whether small language models
can learn coherent language generation when trained on high-quality, simple text.
Each story was tokenized with Unicode normalization (smart quotes, em dashes, etc.
converted to ASCII) and separated by <|endoftext|> (token 50256) so the model
learns clean story boundaries.
Quick Start
Method 1: Using Gradio Chat UI (Best way to test the model capability)
Chat UI: nanoGPT SLM -- Children's Story Generator + Illustration
Image model: stabilityai/stable-diffusion-xl-base-1.0 via HF Inference API
Method 2: Programmatically downloading & using the nanogpt_slm_tinystories_best.pth model
Option 1: Run directly (downloads model + generates sample stories with predefined prompts)
# Download `nanogpt_slm_pretrained_inference_tinystories.py` in working directory
!pip install torch tiktoken huggingface_hub
!python nanogpt_slm_pretrained_inference_tinystories.py
Option 2: Import and use in your own code to generate children's short stories
# pip install torch tiktoken huggingface_hub
# Download `nanogpt_slm_pretrained_inference_tinystories.py` in working directory
# Method 1 -- Quick story generation
from nanogpt_slm_pretrained_inference_tinystories import tell_story, ask, generate_text
# story = tell_story("Once upon a time there was a little kitten")
story = tell_story(input("Enter a story prompt (e.g., 'Once upon a time there was a little kitten'): ").strip())
print(story)
print("--------------------")
# Method 2 -- Simple text completion
# print(ask("The friendly dragon lived in"))
print(ask(input("Enter a prompt for text completion (e.g., 'The friendly dragon lived in'): ").strip()))
print("--------------------")
# Method 3 -- Fine-grained control
print(generate_text(
"A girl named Lily went to the park",
max_tokens=500, # generous budget -- EOS stops at story end (Max Context length = 512)
temperature=0.8, # 0.01=predictable, 0.8=balanced, 1.5=creative
top_k=40 # sampling diversity
))
print("--------------------")
Load weights manually
from huggingface_hub import hf_hub_download
import torch
model_path = hf_hub_download(
repo_id="nishantup/nanogpt-pretrained-slm-tinystories-124m",
filename="nanogpt_slm_tinystories_best.pth"
)
from nanogpt_slm_pretrained_inference_tinystories import GPT, GPTKV, GPTConfig
config = GPTConfig()
model = GPTKV(config) # KV-cache enabled for fast generation
model.load_state_dict(torch.load(model_path, map_location="cpu"))
model.eval()
Model Architecture
| Attribute | Value |
|---|---|
| Architecture | nanoGPT (GPT-2 small, 12 layers, 12 heads, 768 dim) |
| Parameters | 124.0M (unique, with weight tying) |
| Context length | 512 tokens |
| Tokenizer | tiktoken GPT-2 BPE (50,257 tokens) |
| Weight tying | Yes (token embeddings = LM head) |
| Attention | Flash Attention when available, causal mask |
| Normalization | Pre-norm (LayerNorm before attention/MLP) |
| Activation | GELU |
| KV Cache | GPTKV variant included for O(1) per-token decode |
| EOS token | <|endoftext|> (50256) -- learned story boundary |
Training Details
| Attribute | Value |
|---|---|
| Hardware | Google Colab Pro (NVIDIA G4 48GB) |
| Iterations | 70,000 (~2.4 epochs over TinyStories) |
| Batch size | 32 sequences x 512 tokens = 16,384 tokens/step |
| Gradient accumulation | 8 steps (effective batch = 131,072 tokens) |
| Total tokens seen | ~1.15B |
| Optimizer | AdamW (lr=6e-4, betas=(0.9, 0.95), wd=0.1) |
| LR schedule | Linear warmup (3,000 steps) + cosine decay to 1e-5 |
| Precision | bfloat16 (G4) |
| Gradient clipping | max_norm=1.0 |
| Data preprocessing | Unicode normalization + <|endoftext|> story separators |
Files
| File | Description |
|---|---|
nanogpt_slm_tinystories_best.pth |
Pretrained model weights (best validation loss) |
nanogpt_slm_pretrained_inference.py |
Standalone inference script with KV cache + EOS stopping |
config.json |
Model configuration and training details |
API Reference
tell_story(beginning, max_tokens=500, temperature=0.8, top_k=40)
Generate a children's story from an opening line. Best for creative story generation.
ask(prompt, max_tokens=500, temperature=0.8, top_k=40)
General text completion. Alias for generate_text().
generate_text(prompt, max_tokens=500, temperature=0.8, top_k=40)
Low-level text generation with full parameter control.
| Parameter | Default | Description |
|---|---|---|
prompt / beginning |
(required) | Text to continue from |
max_tokens |
500 |
Maximum tokens to generate (EOS stops earlier) |
temperature |
0.8 |
0.01 = predictable, 0.8 = balanced, 1.5 = wild |
top_k |
40 |
Top-k filtering (None = no filtering) |
Example Outputs
Prompt: "Once upon a time there was a little bear"
Once upon a time there was a little bear who lived in a big forest. The bear loved to play with his friends. One sunny day, he went for a walk and found a beautiful flower. He picked it up and brought it home to show his mama...
Prompt: "The princess looked out her window and saw"
The princess looked out her window and saw a big rainbow in the sky. She was so happy! She ran outside to get a closer look. A little bird flew down and sat on her hand. "Hello!" said the princess...
Fine-tuned Variants (experimental versions finetuned on different datasets)
| Variant | Type | Repo |
|---|---|---|
| This model | Pretrained (TinyStories) | nishantup/nanogpt-pretrained-slm-tinystories-124m |
| Instruction-tuned (nanoGPT) | SFT | nishantup/nanogpt-slm-tinystories-instruct |
| Spam classifier (nanoGPT) | Classification | nishantup/nanogpt-slm-tinystories-classifier |
| Instruction-tuned (Raschka) | SFT | nishantup/gpt2-slm-instruct |
Citation
If you use this model, please cite the TinyStories paper:
Eldan, R., & Li, Y. (2023). TinyStories: How Small Can Language Models Be
and Still Speak Coherent English? arXiv preprint arXiv:2305.07759.
Notes
- Big Shout out to Dr. Raj Dandekar from Vizuara.ai. Religiously followed Raj's Building LLM from scratch workshop
- Trained completely from scratch (no pretrained initialization)
- Uses KV cache (GPTKV) for O(1) per-token decode during inference
- Weight tying between token embeddings (wte) and LM head (lm_head)
- Architecture follows Karpathy's nanoGPT implementation
- Training data preprocessed with Unicode normalization (smart quotes -> ASCII)
- Stories separated with
<|endoftext|>for clean EOS-based generation stopping
- Downloads last month
- 4,393