🧠✨ TransformerLM (AR 512, 8K vocab) β€” SimpleStories

This is the result of the code from https://github.com/triloy8/transformerlm, a minimal autoregressive Transformer LM trained on SimpleStories with a 512-token context and an 8K vocab tokenizer. ✨

βœ… Key Facts

  • Model type: Autoregressive Transformer LM
  • Dataset: SimpleStories
  • Context length: 512 tokens
  • Tokenizer vocab size: 8,000
  • Layers: 12
  • Heads: 8
  • d_model: 512
  • d_ff: 2,048
  • Training setup: Single NVIDIA A40 48GB
  • Runtime: ~20 hours ⏱️

πŸ“¦ What’s Inside

  • 60k steps from a 60k run, including:
    • Optimizer state
    • RNG state
    • Safetensors weights
  • Tokenizer config
  • Run config

πŸš€ Reproducibility

To reproduce the run:

Exact commit that launched the train: https://github.com/triloy8/transformerlm/commit/06cb4831d47c04a18573bee8e28dc83b10086d06

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train trixyL/transformerlm-ar-8k-simplestories