ziadkassem commited on
Commit
0d74e60
·
verified ·
1 Parent(s): 27fb4f0

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +98 -0
README.md ADDED
@@ -0,0 +1,98 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: mit
4
+ tags:
5
+ - pytorch
6
+ - language-model
7
+ - causal-lm
8
+ - llama-style
9
+ - gqa
10
+ - rope
11
+ - swiglu
12
+ - rmsnorm
13
+ - pretrained-from-scratch
14
+ datasets:
15
+ - roneneldan/TinyStories
16
+ metrics:
17
+ - perplexity
18
+ ---
19
+
20
+ # StoryGPT
21
+
22
+ A **50M parameter** LLaMA-style decoder-only transformer pre-trained from scratch on the [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories) dataset.
23
+
24
+ Built as an end-to-end CV showcase demonstrating a production-grade LLM pre-training pipeline.
25
+
26
+ ## Model Description
27
+
28
+ | Component | Implementation |
29
+ |---|---|
30
+ | Attention | Grouped Query Attention (GQA) — same as LLaMA 2/3 |
31
+ | Position Encoding | Rotary Embeddings (RoPE) |
32
+ | Normalization | RMSNorm |
33
+ | Activation | SwiGLU FFN |
34
+ | Weight Tying | Embedding weight = Output head weight |
35
+ | Tokenizer | Custom BPE trained from scratch (16,384 vocab) |
36
+
37
+ **Config:**
38
+ ```
39
+ vocab_size : 16,384
40
+ context_length: 512
41
+ emb_dim : 512
42
+ n_heads : 8
43
+ n_kv_heads : 4 (GQA)
44
+ n_layers : 8
45
+ ffn_hidden : 1,376
46
+ Parameters : ~50M
47
+ ```
48
+
49
+ ## Training
50
+
51
+ - **Dataset:** TinyStories (150k stories, ~40M tokens)
52
+ - **Steps:** 20,000
53
+ - **Optimizer:** AdamW (β=(0.9, 0.95), weight_decay=0.1)
54
+ - **LR Schedule:** Cosine decay with linear warmup (500 steps), peak 3e-4 → min 3e-5
55
+ - **Gradient Clipping:** 1.0
56
+ - **Mixed Precision:** torch.cuda.amp (AMP float16)
57
+ - **Hardware:** 2× NVIDIA T4 (DataParallel) on Kaggle
58
+
59
+ ## Results
60
+
61
+ | Metric | Value |
62
+ |---|---|
63
+ | Train Loss | 1.36 |
64
+ | Val Loss | 1.41 |
65
+ | **Perplexity** | **4.09** |
66
+
67
+ ## Usage
68
+
69
+ ```python
70
+ import torch
71
+ from huggingface_hub import hf_hub_download
72
+ from tokenizers import Tokenizer
73
+
74
+ # Download model and tokenizer
75
+ weights_path = hf_hub_download(repo_id="YOUR_HF_USERNAME/StoryGPT", filename="best_model.pt")
76
+ tok_path = hf_hub_download(repo_id="YOUR_HF_USERNAME/StoryGPT", filename="storygpt_tokenizer.json")
77
+
78
+ tokenizer = Tokenizer.from_file(tok_path)
79
+
80
+ # Load model (copy model source files locally first)
81
+ from StoryGPT.model.gpt import GPT
82
+ from StoryGPT.config import MODEL_CONFIG
83
+
84
+ model = GPT(MODEL_CONFIG)
85
+ weights = torch.load(weights_path, map_location="cpu")
86
+ if list(weights.keys())[0].startswith("module."):
87
+ weights = {k.replace("module.", ""): v for k, v in weights.items()}
88
+ model.load_state_dict(weights)
89
+ model.eval()
90
+ ```
91
+
92
+ ## Sample Output
93
+
94
+ > *Once upon a time, there was a little boy named Timmy. Timmy loved to play with his toys and go on adventures. One day, he decided to explore the forest near his house...*
95
+
96
+ ## License
97
+
98
+ MIT