trixyL
/

transformerlm-ar-8k-simplestories

Text Generation

Model card Files Files and versions

🧠✨ TransformerLM (AR 512, 8K vocab) — SimpleStories

This is the result of the code from https://github.com/triloy8/transformerlm, a minimal autoregressive Transformer LM trained on SimpleStories with a 512-token context and an 8K vocab tokenizer. ✨

✅ Key Facts

Model type: Autoregressive Transformer LM
Dataset: SimpleStories
Context length: 512 tokens
Tokenizer vocab size: 8,000
Layers: 12
Heads: 8
d_model: 512
d_ff: 2,048
Training setup: Single NVIDIA A40 48GB
Runtime: ~20 hours ⏱️

📦 What’s Inside

60k steps from a 60k run, including:
- Optimizer state
- RNG state
- Safetensors weights
Tokenizer config
Run config

🚀 Reproducibility

To reproduce the run:

Exact commit that launched the train: https://github.com/triloy8/transformerlm/commit/06cb4831d47c04a18573bee8e28dc83b10086d06

Downloads last month: -; Downloads are not tracked for this model. How to track

Dataset used to train trixyL/transformerlm-ar-8k-simplestories