nanoGPT SLM -- 124.0M Parameter Children's Story Generator

A small language model trained entirely from scratch using a custom nanoGPT (GPT-2 small) implementation. Pretrained on the TinyStories dataset to generate short, coherent stories for young children (ages 3-5).

What This Model Does

This model generates short children's stories suitable for 3-5 year olds. Give it the beginning of a story and it will continue writing in simple, age-appropriate language:

Input:  "Once upon a time there was a little rabbit"
Output: "Once upon a time there was a little rabbit who lived in a big forest.
         The rabbit loved to hop and play with his friends. One day, he
         found a shiny red ball near the river..."

Capabilities:

  • Generates coherent short stories (100-500 words) using the pretrained model from scratch
  • Uses simple vocabulary appropriate for young children
  • Follows common story patterns (characters, conflict, resolution)
  • Understands basic narrative structure (beginning, middle, end)
  • Stops cleanly at story boundaries via learned <|endoftext|> token
  • Can continue from any story opening/prompt
  • As an add on, the Chat UI also gives users to generate an image of the story generated using stable-diffusion-xl-base-1.0 model

Limitations:

  • Stories are short (max 512 token context window)
  • Limited to simple vocabulary and narrative structures
  • No instruction-following ability (see fine-tuned variants below)
  • May occasionally generate repetitive or nonsensical text
  • English only

Training Dataset: TinyStories

Attribute Value
Dataset TinyStories (Eldan & Li, 2023)
Description Synthetic short stories generated by GPT-3.5/GPT-4, filtered for quality
Target audience Children aged 3-5 years
Vocabulary Words that a typical 3-4 year old would understand
Training stories ~2,119,719
Validation stories ~21,990
Total tokens ~470M
Average story length ~220 tokens
Topics Animals, friendship, family, nature, adventure, sharing, kindness

The TinyStories dataset was specifically designed to study whether small language models can learn coherent language generation when trained on high-quality, simple text. Each story was tokenized with Unicode normalization (smart quotes, em dashes, etc. converted to ASCII) and separated by <|endoftext|> (token 50256) so the model learns clean story boundaries.

Quick Start

Method 1: Using Gradio Chat UI (Best way to test the model capability)

Chat UI: nanoGPT SLM -- Children's Story Generator + Illustration

Image model: stabilityai/stable-diffusion-xl-base-1.0 via HF Inference API

Method 2: Programmatically downloading & using the nanogpt_slm_tinystories_best.pth model

Option 1: Run directly (downloads model + generates sample stories with predefined prompts)

# Download `nanogpt_slm_pretrained_inference_tinystories.py` in working directory

!pip install torch tiktoken huggingface_hub
!python nanogpt_slm_pretrained_inference_tinystories.py

Option 2: Import and use in your own code to generate children's short stories

# pip install torch tiktoken huggingface_hub
# Download `nanogpt_slm_pretrained_inference_tinystories.py` in working directory

# Method 1 -- Quick story generation
from nanogpt_slm_pretrained_inference_tinystories import tell_story, ask, generate_text

# story = tell_story("Once upon a time there was a little kitten")
story = tell_story(input("Enter a story prompt (e.g., 'Once upon a time there was a little kitten'): ").strip())
print(story)
print("--------------------")

# Method 2 -- Simple text completion
# print(ask("The friendly dragon lived in"))
print(ask(input("Enter a prompt for text completion (e.g., 'The friendly dragon lived in'): ").strip()))
print("--------------------")

# Method 3 -- Fine-grained control
print(generate_text(
    "A girl named Lily went to the park",
    max_tokens=500,     # generous budget -- EOS stops at story end  (Max Context length = 512)
    temperature=0.8,    # 0.01=predictable, 0.8=balanced, 1.5=creative
    top_k=40            # sampling diversity
))
print("--------------------")

Load weights manually

from huggingface_hub import hf_hub_download
import torch

model_path = hf_hub_download(
    repo_id="nishantup/nanogpt-pretrained-slm-tinystories-124m",
    filename="nanogpt_slm_tinystories_best.pth"
)

from nanogpt_slm_pretrained_inference_tinystories import GPT, GPTKV, GPTConfig

config = GPTConfig()
model = GPTKV(config)  # KV-cache enabled for fast generation
model.load_state_dict(torch.load(model_path, map_location="cpu"))
model.eval()

Model Architecture

Attribute Value
Architecture nanoGPT (GPT-2 small, 12 layers, 12 heads, 768 dim)
Parameters 124.0M (unique, with weight tying)
Context length 512 tokens
Tokenizer tiktoken GPT-2 BPE (50,257 tokens)
Weight tying Yes (token embeddings = LM head)
Attention Flash Attention when available, causal mask
Normalization Pre-norm (LayerNorm before attention/MLP)
Activation GELU
KV Cache GPTKV variant included for O(1) per-token decode
EOS token <|endoftext|> (50256) -- learned story boundary

Training Details

Attribute Value
Hardware Google Colab Pro (NVIDIA G4 48GB)
Iterations 70,000 (~2.4 epochs over TinyStories)
Batch size 32 sequences x 512 tokens = 16,384 tokens/step
Gradient accumulation 8 steps (effective batch = 131,072 tokens)
Total tokens seen ~1.15B
Optimizer AdamW (lr=6e-4, betas=(0.9, 0.95), wd=0.1)
LR schedule Linear warmup (3,000 steps) + cosine decay to 1e-5
Precision bfloat16 (G4)
Gradient clipping max_norm=1.0
Data preprocessing Unicode normalization + <|endoftext|> story separators

Files

File Description
nanogpt_slm_tinystories_best.pth Pretrained model weights (best validation loss)
nanogpt_slm_pretrained_inference.py Standalone inference script with KV cache + EOS stopping
config.json Model configuration and training details

API Reference

tell_story(beginning, max_tokens=500, temperature=0.8, top_k=40)

Generate a children's story from an opening line. Best for creative story generation.

ask(prompt, max_tokens=500, temperature=0.8, top_k=40)

General text completion. Alias for generate_text().

generate_text(prompt, max_tokens=500, temperature=0.8, top_k=40)

Low-level text generation with full parameter control.

Parameter Default Description
prompt / beginning (required) Text to continue from
max_tokens 500 Maximum tokens to generate (EOS stops earlier)
temperature 0.8 0.01 = predictable, 0.8 = balanced, 1.5 = wild
top_k 40 Top-k filtering (None = no filtering)

Example Outputs

Prompt: "Once upon a time there was a little bear"

Once upon a time there was a little bear who lived in a big forest. The bear loved to play with his friends. One sunny day, he went for a walk and found a beautiful flower. He picked it up and brought it home to show his mama...

Prompt: "The princess looked out her window and saw"

The princess looked out her window and saw a big rainbow in the sky. She was so happy! She ran outside to get a closer look. A little bird flew down and sat on her hand. "Hello!" said the princess...

Fine-tuned Variants (experimental versions finetuned on different datasets)

Variant Type Repo
This model Pretrained (TinyStories) nishantup/nanogpt-pretrained-slm-tinystories-124m
Instruction-tuned (nanoGPT) SFT nishantup/nanogpt-slm-tinystories-instruct
Spam classifier (nanoGPT) Classification nishantup/nanogpt-slm-tinystories-classifier
Instruction-tuned (Raschka) SFT nishantup/gpt2-slm-instruct

Citation

If you use this model, please cite the TinyStories paper:

Eldan, R., & Li, Y. (2023). TinyStories: How Small Can Language Models Be
and Still Speak Coherent English? arXiv preprint arXiv:2305.07759.

Notes

  • Big Shout out to Dr. Raj Dandekar from Vizuara.ai. Religiously followed Raj's Building LLM from scratch workshop
  • Trained completely from scratch (no pretrained initialization)
  • Uses KV cache (GPTKV) for O(1) per-token decode during inference
  • Weight tying between token embeddings (wte) and LM head (lm_head)
  • Architecture follows Karpathy's nanoGPT implementation
  • Training data preprocessed with Unicode normalization (smart quotes -> ASCII)
  • Stories separated with <|endoftext|> for clean EOS-based generation stopping
Downloads last month
4,393
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train nishantup/nanogpt-pretrained-slm-tinystories-124m

Space using nishantup/nanogpt-pretrained-slm-tinystories-124m 1

Paper for nishantup/nanogpt-pretrained-slm-tinystories-124m