GPT SLM (Raschka Architecture) -- 163.2M Parameters

A small language model trained from scratch using the Raschka-style GPTModel architecture. Pretrained on 133 classic English fiction novels from Project Gutenberg.

Note: This repo also contains nanogpt_slm_best.pth (nanoGPT architecture). The Raschka weights (gpt_slm_best.pth) use a different architecture with separate W_query/W_key/W_value projections and no weight tying.

Quick Start

Option 1: Run directly (downloads model + runs examples)

pip install torch tiktoken huggingface_hub
python gpt_slm_pretrained_inference.py

Option 2: Import and use ask() in your own code

# Import loads the model automatically (one-time download from HuggingFace)
from gpt_slm_pretrained_inference import ask, generate_text

# Text completion
print(ask("Once upon a time there was"))
print()

# Control generation
print(ask(
    "The meaning of life is",
    temperature=1.0,    # higher = more creative
    top_k=100,          # wider sampling pool
    max_tokens=150      # longer output
))
print()

# generate_text is an alias for ask
print(generate_text("She opened the door and saw", max_tokens=200))
print()

Option 3: Load weights manually

from huggingface_hub import hf_hub_download
import torch

model_path = hf_hub_download(
    repo_id="nishantup/RaschkastyleGPT-pretrained-slm-163m",
    filename="gpt_slm_best.pth"
)

from gpt_slm_pretrained_inference import GPTModel, BASE_CONFIG

model = GPTModel(BASE_CONFIG)
model.load_state_dict(torch.load(model_path, map_location="cpu"))
model.eval()

Architecture Comparison

Feature gpt_slm_best.pth (Raschka) nanogpt_slm_best.pth (nanoGPT)
Attention Separate W_query, W_key, W_value Combined c_attn
LayerNorm scale/shift params weight/bias params
MLP FeedForward (Sequential) MLP (c_fc/c_proj)
Config Dict (BASE_CONFIG) Dataclass (GPTConfig)
Weight tying No Yes (wte = lm_head)
forward() returns logits (logits, loss) tuple
KV Cache Not included Included (GPTKV)

Model Details

Attribute Value
Parameters 163.2M
Architecture Raschka GPTModel (12 layers, 12 heads, 768 dim)
Context length 256 tokens
Tokenizer tiktoken GPT-2 BPE (50,257 tokens)
Training data 133 English fiction novels (37.5M tokens)
Framework PyTorch

Files (Raschka variant)

File Description
gpt_slm_best.pth Pretrained weights (Raschka GPTModel)
gpt_slm_pretrained_inference.py Standalone inference script -- import and call ask()
config_gpt_slm.json Raschka model configuration

ask() / generate_text() API Reference

ask(prompt, max_tokens=200, temperature=0.8, top_k=40)
generate_text(prompt, max_tokens=200, temperature=0.8, top_k=40)  # alias
Parameter Default Description
prompt (required) Text to continue from
max_tokens 200 Maximum tokens to generate
temperature 0.8 0.01 = near-greedy, 0.8 = balanced, 1.5 = creative
top_k 40 Top-k filtering (None = no filtering)

Related Models

Variant Architecture Repo / File
Pretrained (nanoGPT) nanoGPT GPT class nanogpt_slm_best.pth in this repo
Instruction-tuned (SFT) nanoGPT GPT class nishantup/nanogpt-slm-instruct
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support