GPT SLM (Raschka Architecture) -- 163.2M Parameters
A small language model trained from scratch using the Raschka-style GPTModel architecture.
Pretrained on 133 classic English fiction novels from Project Gutenberg.
Note: This repo also contains nanogpt_slm_best.pth (nanoGPT architecture).
The Raschka weights (gpt_slm_best.pth) use a different architecture with separate
W_query/W_key/W_value projections and no weight tying.
Quick Start
Option 1: Run directly (downloads model + runs examples)
pip install torch tiktoken huggingface_hub
python gpt_slm_pretrained_inference.py
Option 2: Import and use ask() in your own code
from gpt_slm_pretrained_inference import ask, generate_text
print(ask("Once upon a time there was"))
print()
print(ask(
"The meaning of life is",
temperature=1.0,
top_k=100,
max_tokens=150
))
print()
print(generate_text("She opened the door and saw", max_tokens=200))
print()
Option 3: Load weights manually
from huggingface_hub import hf_hub_download
import torch
model_path = hf_hub_download(
repo_id="nishantup/RaschkastyleGPT-pretrained-slm-163m",
filename="gpt_slm_best.pth"
)
from gpt_slm_pretrained_inference import GPTModel, BASE_CONFIG
model = GPTModel(BASE_CONFIG)
model.load_state_dict(torch.load(model_path, map_location="cpu"))
model.eval()
Architecture Comparison
| Feature |
gpt_slm_best.pth (Raschka) |
nanogpt_slm_best.pth (nanoGPT) |
| Attention |
Separate W_query, W_key, W_value |
Combined c_attn |
| LayerNorm |
scale/shift params |
weight/bias params |
| MLP |
FeedForward (Sequential) |
MLP (c_fc/c_proj) |
| Config |
Dict (BASE_CONFIG) |
Dataclass (GPTConfig) |
| Weight tying |
No |
Yes (wte = lm_head) |
| forward() returns |
logits |
(logits, loss) tuple |
| KV Cache |
Not included |
Included (GPTKV) |
Model Details
| Attribute |
Value |
| Parameters |
163.2M |
| Architecture |
Raschka GPTModel (12 layers, 12 heads, 768 dim) |
| Context length |
256 tokens |
| Tokenizer |
tiktoken GPT-2 BPE (50,257 tokens) |
| Training data |
133 English fiction novels (37.5M tokens) |
| Framework |
PyTorch |
Files (Raschka variant)
| File |
Description |
gpt_slm_best.pth |
Pretrained weights (Raschka GPTModel) |
gpt_slm_pretrained_inference.py |
Standalone inference script -- import and call ask() |
config_gpt_slm.json |
Raschka model configuration |
ask() / generate_text() API Reference
ask(prompt, max_tokens=200, temperature=0.8, top_k=40)
generate_text(prompt, max_tokens=200, temperature=0.8, top_k=40)
| Parameter |
Default |
Description |
prompt |
(required) |
Text to continue from |
max_tokens |
200 |
Maximum tokens to generate |
temperature |
0.8 |
0.01 = near-greedy, 0.8 = balanced, 1.5 = creative |
top_k |
40 |
Top-k filtering (None = no filtering) |
Related Models
| Variant |
Architecture |
Repo / File |
| Pretrained (nanoGPT) |
nanoGPT GPT class |
nanogpt_slm_best.pth in this repo |
| Instruction-tuned (SFT) |
nanoGPT GPT class |
nishantup/nanogpt-slm-instruct |