--- language: en license: mit tags: - pytorch - language-model - causal-lm - llama-style - gqa - rope - swiglu - rmsnorm - pretrained-from-scratch datasets: - roneneldan/TinyStories metrics: - perplexity --- # StoryGPT A **50M parameter** LLaMA-style decoder-only transformer pre-trained from scratch on the [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories) dataset. Built as an end-to-end CV showcase demonstrating a production-grade LLM pre-training pipeline. ## Model Description | Component | Implementation | |---|---| | Attention | Grouped Query Attention (GQA) — same as LLaMA 2/3 | | Position Encoding | Rotary Embeddings (RoPE) | | Normalization | RMSNorm | | Activation | SwiGLU FFN | | Weight Tying | Embedding weight = Output head weight | | Tokenizer | Custom BPE trained from scratch (16,384 vocab) | **Config:** ``` vocab_size : 16,384 context_length: 512 emb_dim : 512 n_heads : 8 n_kv_heads : 4 (GQA) n_layers : 8 ffn_hidden : 1,376 Parameters : ~50M ``` ## Training - **Dataset:** TinyStories (150k stories, ~40M tokens) - **Steps:** 20,000 - **Optimizer:** AdamW (β=(0.9, 0.95), weight_decay=0.1) - **LR Schedule:** Cosine decay with linear warmup (500 steps), peak 3e-4 → min 3e-5 - **Gradient Clipping:** 1.0 - **Mixed Precision:** torch.cuda.amp (AMP float16) - **Hardware:** 2× NVIDIA T4 (DataParallel) on Kaggle ## Results | Metric | Value | |---|---| | Train Loss | 1.36 | | Val Loss | 1.41 | | **Perplexity** | **4.09** | ## Usage ```python import torch from huggingface_hub import hf_hub_download from tokenizers import Tokenizer # Download model and tokenizer weights_path = hf_hub_download(repo_id="YOUR_HF_USERNAME/StoryGPT", filename="best_model.pt") tok_path = hf_hub_download(repo_id="YOUR_HF_USERNAME/StoryGPT", filename="storygpt_tokenizer.json") tokenizer = Tokenizer.from_file(tok_path) # Load model (copy model source files locally first) from StoryGPT.model.gpt import GPT from StoryGPT.config import MODEL_CONFIG model = GPT(MODEL_CONFIG) weights = torch.load(weights_path, map_location="cpu") if list(weights.keys())[0].startswith("module."): weights = {k.replace("module.", ""): v for k, v in weights.items()} model.load_state_dict(weights) model.eval() ``` ## Sample Output > *Once upon a time, there was a little boy named Timmy. Timmy loved to play with his toys and go on adventures. One day, he decided to explore the forest near his house...* ## License MIT