nanoGPT SLM -- 124.0M Parameter Children's Story Generator

A small language model trained entirely from scratch using a custom nanoGPT (GPT-2 small) implementation. Pretrained on the TinyStories dataset to generate short, coherent stories for young children (ages 3-5).

What This Model Does

This model generates short children's stories suitable for 3-5 year olds. Give it the beginning of a story and it will continue writing in simple, age-appropriate language:

Input:  "Once upon a time there was a little rabbit"
Output: "Once upon a time there was a little rabbit who lived in a big forest.
         The rabbit loved to hop and play with his friends. One day, he
         found a shiny red ball near the river..."

Capabilities:

Generates coherent short stories (100-500 words) using the pretrained model from scratch
Uses simple vocabulary appropriate for young children
Follows common story patterns (characters, conflict, resolution)
Understands basic narrative structure (beginning, middle, end)
Stops cleanly at story boundaries via learned <|endoftext|> token
Can continue from any story opening/prompt
As an add on, the Chat UI also gives users to generate an image of the story generated using stable-diffusion-xl-base-1.0 model

Limitations:

Stories are short (max 512 token context window)
Limited to simple vocabulary and narrative structures
No instruction-following ability (see fine-tuned variants below)
May occasionally generate repetitive or nonsensical text
English only

Training Dataset: TinyStories

Attribute	Value
Dataset	TinyStories (Eldan & Li, 2023)
Description	Synthetic short stories generated by GPT-3.5/GPT-4, filtered for quality
Target audience	Children aged 3-5 years
Vocabulary	Words that a typical 3-4 year old would understand
Training stories	~2,119,719
Validation stories	~21,990
Total tokens	~470M
Average story length	~220 tokens
Topics	Animals, friendship, family, nature, adventure, sharing, kindness

The TinyStories dataset was specifically designed to study whether small language models can learn coherent language generation when trained on high-quality, simple text. Each story was tokenized with Unicode normalization (smart quotes, em dashes, etc. converted to ASCII) and separated by <|endoftext|> (token 50256) so the model learns clean story boundaries.

Quick Start

Method 1: Using Gradio Chat UI (Best way to test the model capability)

Chat UI: nanoGPT SLM -- Children's Story Generator + Illustration

Image model: `stabilityai/stable-diffusion-xl-base-1.0` via HF Inference API

Method 2: Programmatically downloading & using the `nanogpt_slm_tinystories_best.pth` model

Option 1: Run directly (downloads model + generates sample stories with predefined prompts)

# Download `nanogpt_slm_pretrained_inference_tinystories.py` in working directory

!pip install torch tiktoken huggingface_hub
!python nanogpt_slm_pretrained_inference_tinystories.py

Option 2: Import and use in your own code to generate children's short stories

# pip install torch tiktoken huggingface_hub
# Download `nanogpt_slm_pretrained_inference_tinystories.py` in working directory

# Method 1 -- Quick story generation
from nanogpt_slm_pretrained_inference_tinystories import tell_story, ask, generate_text

# story = tell_story("Once upon a time there was a little kitten")
story = tell_story(input("Enter a story prompt (e.g., 'Once upon a time there was a little kitten'): ").strip())
print(story)
print("--------------------")

# Method 2 -- Simple text completion
# print(ask("The friendly dragon lived in"))
print(ask(input("Enter a prompt for text completion (e.g., 'The friendly dragon lived in'): ").strip()))
print("--------------------")

# Method 3 -- Fine-grained control
print(generate_text(
    "A girl named Lily went to the park",
    max_tokens=500,     # generous budget -- EOS stops at story end  (Max Context length = 512)
    temperature=0.8,    # 0.01=predictable, 0.8=balanced, 1.5=creative
    top_k=40            # sampling diversity
))
print("--------------------")

Load weights manually

from huggingface_hub import hf_hub_download
import torch

model_path = hf_hub_download(
    repo_id="nishantup/nanogpt-pretrained-slm-tinystories-124m",
    filename="nanogpt_slm_tinystories_best.pth"
)

from nanogpt_slm_pretrained_inference_tinystories import GPT, GPTKV, GPTConfig

config = GPTConfig()
model = GPTKV(config)  # KV-cache enabled for fast generation
model.load_state_dict(torch.load(model_path, map_location="cpu"))
model.eval()

Model Architecture

Attribute	Value
Architecture	nanoGPT (GPT-2 small, 12 layers, 12 heads, 768 dim)
Parameters	124.0M (unique, with weight tying)
Context length	512 tokens
Tokenizer	tiktoken GPT-2 BPE (50,257 tokens)
Weight tying	Yes (token embeddings = LM head)
Attention	Flash Attention when available, causal mask
Normalization	Pre-norm (LayerNorm before attention/MLP)
Activation	GELU
KV Cache	GPTKV variant included for O(1) per-token decode
EOS token	`<\|endoftext\|>` (50256) -- learned story boundary

Training Details

Attribute	Value
Hardware	Google Colab Pro (NVIDIA G4 48GB)
Iterations	70,000 (~2.4 epochs over TinyStories)
Batch size	32 sequences x 512 tokens = 16,384 tokens/step
Gradient accumulation	8 steps (effective batch = 131,072 tokens)
Total tokens seen	~1.15B
Optimizer	AdamW (lr=6e-4, betas=(0.9, 0.95), wd=0.1)
LR schedule	Linear warmup (3,000 steps) + cosine decay to 1e-5
Precision	bfloat16 (G4)
Gradient clipping	max_norm=1.0
Data preprocessing	Unicode normalization + `<\|endoftext\|>` story separators

Files

File	Description
`nanogpt_slm_tinystories_best.pth`	Pretrained model weights (best validation loss)
`nanogpt_slm_pretrained_inference.py`	Standalone inference script with KV cache + EOS stopping
`config.json`	Model configuration and training details

API Reference

`tell_story(beginning, max_tokens=500, temperature=0.8, top_k=40)`

Generate a children's story from an opening line. Best for creative story generation.

`ask(prompt, max_tokens=500, temperature=0.8, top_k=40)`

General text completion. Alias for generate_text().

`generate_text(prompt, max_tokens=500, temperature=0.8, top_k=40)`

Low-level text generation with full parameter control.

Parameter	Default	Description
`prompt` / `beginning`	(required)	Text to continue from
`max_tokens`	`500`	Maximum tokens to generate (EOS stops earlier)
`temperature`	`0.8`	0.01 = predictable, 0.8 = balanced, 1.5 = wild
`top_k`	`40`	Top-k filtering (None = no filtering)

Example Outputs

Prompt: "Once upon a time there was a little bear"

Once upon a time there was a little bear who lived in a big forest. The bear loved to play with his friends. One sunny day, he went for a walk and found a beautiful flower. He picked it up and brought it home to show his mama...

Prompt: "The princess looked out her window and saw"

The princess looked out her window and saw a big rainbow in the sky. She was so happy! She ran outside to get a closer look. A little bird flew down and sat on her hand. "Hello!" said the princess...

Fine-tuned Variants (experimental versions finetuned on different datasets)

Variant	Type	Repo
This model	Pretrained (TinyStories)	nishantup/nanogpt-pretrained-slm-tinystories-124m
Instruction-tuned (nanoGPT)	SFT	nishantup/nanogpt-slm-tinystories-instruct
Spam classifier (nanoGPT)	Classification	nishantup/nanogpt-slm-tinystories-classifier
Instruction-tuned (Raschka)	SFT	nishantup/gpt2-slm-instruct

Citation

If you use this model, please cite the TinyStories paper:

Eldan, R., & Li, Y. (2023). TinyStories: How Small Can Language Models Be
and Still Speak Coherent English? arXiv preprint arXiv:2305.07759.

Notes

Big Shout out to Dr. Raj Dandekar from Vizuara.ai. Religiously followed Raj's Building LLM from scratch workshop
Trained completely from scratch (no pretrained initialization)
Uses KV cache (GPTKV) for O(1) per-token decode during inference
Weight tying between token embeddings (wte) and LM head (lm_head)
Architecture follows Karpathy's nanoGPT implementation
Training data preprocessed with Unicode normalization (smart quotes -> ASCII)
Stories separated with <|endoftext|> for clean EOS-based generation stopping

Downloads last month: 4,393

Dataset used to train nishantup/nanogpt-pretrained-slm-tinystories-124m

Space using nishantup/nanogpt-pretrained-slm-tinystories-124m 1

Paper for nishantup/nanogpt-pretrained-slm-tinystories-124m

TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

Paper • 2305.07759 • Published May 12, 2023 • 45