GreyMatter: A Transformer Language Model from Scratch
GreyMatter is a custom Transformer-based language model implemented from scratch in PyTorch for learning and research purposes. It includes pretraining on a large web dataset and supervised fine-tuning (SFT) on conversational data, similar to the pipeline used in modern LLM development.
Features
- Implemented from scratch in PyTorch (no Hugging Face Transformers).
- Byte-Pair Encoding (BPE) tokenizer trained on 1.1B tokens.
- GPT-style decoder-only Transformer:
- 12 layers, 768 hidden size, 8 heads, 123M parameters.
- Rotary Positional Encoding (RoPE).
- RMSNorm instead of LayerNorm.
- Dropout + weight decay regularization.
Training
- Pretrained on a 5GB subset of Falcon RefinedWeb.
- Optimized with AdamW + gradient accumulation on a single RTX 3090.
Supervised Fine-Tuning (SFT)
- Aligned with UltraChat-200k.
- Further reduced perplexity on both train and validation sets.
Repository Structure
.
βββ dataset.py # BPE tokenizer + dataset loader
βββ model.py # Transformer model (GreyMatter)
βββ train.py # Pretraining script
βββ sft.py # Supervised Fine-Tuning (SFT) script
βββ inference.py # Inference script (decoding strategies)
βββ transformer_block_numpy.py # Numpy Implementation of Transformer Block (without backprop)
βββ README.md # Project documentation
Model Configuration
config = {
'vocab_size': 25000,
'seq_len': 1024,
'd_model': 768,
'n_heads': 8,
'n_layers': 12,
'd_ff': 3072,
'dropout': 0.1,
'batch_size': 8,
'grad_acc_step': 8, # effective batch size = 64
'learning_rate': 1e-4,
'weight_decay': 0.01,
'num_epochs': 3,
}
- Parameters: ~123M
- Pretraining tokens: 1.1B
- Effective batch size: 64 (with gradient accumulation)
Usage
Note: Prepare dataset using train_tokenizer.py (download parquet files)
- Pretraining
python train.py
- Supervised Fine-Tuning (SFT)
python sft.py
- Inference
python inference.py
Results
Pretraining: Achieved significant reduction in perplexity on Falcon RefinedWeb subset.

Fine-tuning: Further decreased perplexity (33 -> 9) on UltraChat.
Scope and Goals
- Focus: AI/ML, LLMs, and NLP research.
- Goal: Learning the internals of LLMs by implementing everything from scratch.
License
MIT License.
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support