mhla
/

gpt1900-d34-sft-dialogue

Model card Files Files and versions

GPT-1900 D34 SFT (Dialogue)

3.29B parameter GPT-1900 model fine-tuned on synthetic period-style dialogue pairs extracted from pre-1900 texts.

Checkpoints

Step 56 (model_000056.pt): Best validation bpb (1.05) — ~5 epochs
Step 443 (model_000443.pt): Final checkpoint (22 epochs, val bpb 2.0) — more overfit but cleaner turn-taking format

Model Details

Architecture: Custom GPT with RoPE, QK-norm, ReLU², value embeddings (ResFormer), per-layer residual/skip scalars
Parameters: 3.29B
Layers: 34
Hidden dim: 2176
Attention heads: 17 (query) / 17 (kv)
Head dim: 128
Context length: 2048 tokens
Vocab size: 32,768 (BPE, GPT-4 style split pattern)
Training data: 34,699 synthetic dialogue conversations from pre-1900 corpus
Base model: gpt1900-d34-22btok

Model Family

- Base pretrained model
- SFT (period style)
- SFT (modern style)
- SFT (dialogue)
- RL post-training

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including mhla/gpt1900-d34-sft-dialogue

GPT-1900 Drafts

Experimental and intermediate GPT-1900 checkpoints. Working artifacts, not for general use. • 49 items • Updated 22 days ago