GPT-1900 D34 SFT (Dialogue)
3.29B parameter GPT-1900 model fine-tuned on synthetic period-style dialogue pairs extracted from pre-1900 texts.
Checkpoints
- Step 56 (model_000056.pt): Best validation bpb (1.05) — ~5 epochs
- Step 443 (model_000443.pt): Final checkpoint (22 epochs, val bpb 2.0) — more overfit but cleaner turn-taking format
Model Details
- Architecture: Custom GPT with RoPE, QK-norm, ReLU², value embeddings (ResFormer), per-layer residual/skip scalars
- Parameters: 3.29B
- Layers: 34
- Hidden dim: 2176
- Attention heads: 17 (query) / 17 (kv)
- Head dim: 128
- Context length: 2048 tokens
- Vocab size: 32,768 (BPE, GPT-4 style split pattern)
- Training data: 34,699 synthetic dialogue conversations from pre-1900 corpus
- Base model: gpt1900-d34-22btok
Model Family
- - Base pretrained model
- - SFT (period style)
- - SFT (modern style)
- - SFT (dialogue)
- - RL post-training