π§ β¨ TransformerLM (AR 512, 8K vocab) β SimpleStories
This is the result of the code from https://github.com/triloy8/transformerlm, a minimal autoregressive Transformer LM trained on SimpleStories with a 512-token context and an 8K vocab tokenizer. β¨
β Key Facts
- Model type: Autoregressive Transformer LM
- Dataset: SimpleStories
- Context length: 512 tokens
- Tokenizer vocab size: 8,000
- Layers: 12
- Heads: 8
- d_model: 512
- d_ff: 2,048
- Training setup: Single NVIDIA A40 48GB
- Runtime: ~20 hours β±οΈ
π¦ Whatβs Inside
- 60k steps from a 60k run, including:
- Optimizer state
- RNG state
- Safetensors weights
- Tokenizer config
- Run config
π Reproducibility
To reproduce the run:
Exact commit that launched the train: https://github.com/triloy8/transformerlm/commit/06cb4831d47c04a18573bee8e28dc83b10086d06