ziadkassem
/

StoryGPT

+---
+language: en
+license: mit
+tags:
+  - pytorch
+  - language-model
+  - causal-lm
+  - llama-style
+  - gqa
+  - rope
+  - swiglu
+  - rmsnorm
+  - pretrained-from-scratch
+datasets:
+  - roneneldan/TinyStories
+metrics:
+  - perplexity
+---
+# StoryGPT
+A **50M parameter** LLaMA-style decoder-only transformer pre-trained from scratch on the [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories) dataset.
+Built as an end-to-end CV showcase demonstrating a production-grade LLM pre-training pipeline.
+## Model Description
+| Component | Implementation |
+|---|---|
+| Attention | Grouped Query Attention (GQA) — same as LLaMA 2/3 |
+| Position Encoding | Rotary Embeddings (RoPE) |
+| Normalization | RMSNorm |
+| Activation | SwiGLU FFN |
+| Weight Tying | Embedding weight = Output head weight |
+| Tokenizer | Custom BPE trained from scratch (16,384 vocab) |
+**Config:**
+```
+vocab_size    : 16,384
+context_length: 512
+emb_dim       : 512
+n_heads       : 8
+n_kv_heads    : 4   (GQA)
+n_layers      : 8
+ffn_hidden    : 1,376
+Parameters    : ~50M
+```
+## Training
+- **Dataset:** TinyStories (150k stories, ~40M tokens)
+- **Steps:** 20,000
+- **Optimizer:** AdamW (β=(0.9, 0.95), weight_decay=0.1)
+- **LR Schedule:** Cosine decay with linear warmup (500 steps), peak 3e-4 → min 3e-5
+- **Gradient Clipping:** 1.0
+- **Mixed Precision:** torch.cuda.amp (AMP float16)
+- **Hardware:** 2× NVIDIA T4 (DataParallel) on Kaggle
+## Results
+| Metric | Value |
+|---|---|
+| Train Loss | 1.36 |
+| Val Loss | 1.41 |
+| **Perplexity** | **4.09** |
+## Usage
+```python
+import torch
+from huggingface_hub import hf_hub_download
+from tokenizers import Tokenizer
+# Download model and tokenizer
+weights_path = hf_hub_download(repo_id="YOUR_HF_USERNAME/StoryGPT", filename="best_model.pt")
+tok_path     = hf_hub_download(repo_id="YOUR_HF_USERNAME/StoryGPT", filename="storygpt_tokenizer.json")
+tokenizer = Tokenizer.from_file(tok_path)
+# Load model (copy model source files locally first)
+from StoryGPT.model.gpt import GPT
+from StoryGPT.config import MODEL_CONFIG
+model = GPT(MODEL_CONFIG)
+weights = torch.load(weights_path, map_location="cpu")
+if list(weights.keys())[0].startswith("module."):
+    weights = {k.replace("module.", ""): v for k, v in weights.items()}
+model.load_state_dict(weights)
+model.eval()
+```
+## Sample Output
+> *Once upon a time, there was a little boy named Timmy. Timmy loved to play with his toys and go on adventures. One day, he decided to explore the forest near his house...*
+## License
+MIT