GPT SLM (Raschka Architecture) -- 163.2M Parameters

A small language model trained from scratch using the Raschka-style GPTModel architecture. Pretrained on 133 classic English fiction novels from Project Gutenberg.

Note: This repo also contains nanogpt_slm_best.pth (nanoGPT architecture). The Raschka weights (gpt_slm_best.pth) use a different architecture with separate W_query/W_key/W_value projections and no weight tying.

Quick Start

Option 1: Run directly (downloads model + runs examples)

pip install torch tiktoken huggingface_hub
python gpt_slm_pretrained_inference.py

Option 2: Import and use `ask()` in your own code

# Import loads the model automatically (one-time download from HuggingFace)
from gpt_slm_pretrained_inference import ask, generate_text

# Text completion
print(ask("Once upon a time there was"))
print()

# Control generation
print(ask(
    "The meaning of life is",
    temperature=1.0,    # higher = more creative
    top_k=100,          # wider sampling pool
    max_tokens=150      # longer output
))
print()

# generate_text is an alias for ask
print(generate_text("She opened the door and saw", max_tokens=200))
print()

Option 3: Load weights manually

from huggingface_hub import hf_hub_download
import torch

model_path = hf_hub_download(
    repo_id="nishantup/RaschkastyleGPT-pretrained-slm-163m",
    filename="gpt_slm_best.pth"
)

from gpt_slm_pretrained_inference import GPTModel, BASE_CONFIG

model = GPTModel(BASE_CONFIG)
model.load_state_dict(torch.load(model_path, map_location="cpu"))
model.eval()

Architecture Comparison

Feature	`gpt_slm_best.pth` (Raschka)	`nanogpt_slm_best.pth` (nanoGPT)
Attention	Separate W_query, W_key, W_value	Combined c_attn
LayerNorm	scale/shift params	weight/bias params
MLP	FeedForward (Sequential)	MLP (c_fc/c_proj)
Config	Dict (BASE_CONFIG)	Dataclass (GPTConfig)
Weight tying	No	Yes (wte = lm_head)
forward() returns	logits	(logits, loss) tuple
KV Cache	Not included	Included (GPTKV)

Model Details

Attribute	Value
Parameters	163.2M
Architecture	Raschka GPTModel (12 layers, 12 heads, 768 dim)
Context length	256 tokens
Tokenizer	tiktoken GPT-2 BPE (50,257 tokens)
Training data	133 English fiction novels (37.5M tokens)
Framework	PyTorch

Files (Raschka variant)

File	Description
`gpt_slm_best.pth`	Pretrained weights (Raschka GPTModel)
`gpt_slm_pretrained_inference.py`	Standalone inference script -- import and call `ask()`
`config_gpt_slm.json`	Raschka model configuration

`ask()` / `generate_text()` API Reference

ask(prompt, max_tokens=200, temperature=0.8, top_k=40)
generate_text(prompt, max_tokens=200, temperature=0.8, top_k=40)  # alias

Parameter	Default	Description
`prompt`	(required)	Text to continue from
`max_tokens`	`200`	Maximum tokens to generate
`temperature`	`0.8`	0.01 = near-greedy, 0.8 = balanced, 1.5 = creative
`top_k`	`40`	Top-k filtering (None = no filtering)

Related Models

Variant	Architecture	Repo / File
Pretrained (nanoGPT)	nanoGPT GPT class	`nanogpt_slm_best.pth` in this repo
Instruction-tuned (SFT)	nanoGPT GPT class	nishantup/nanogpt-slm-instruct

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support