tiny-gpt-lab-v0.3-bigger-16x768

Summary

This is the current strongest checkpoint in the tiny GPT home-lab project.

It keeps the proven light FineWeb broadening blend, but scales the transformer up to 16 layers and 768 embedding size. The goal of this stage was to test whether extra model capacity would translate into a real quality jump before pushing into broader data again.

Source

  • Source run: runs/tinystories-cosmopedia-fineweb-light-vocab-8192-16x768-gpu
  • Dataset blend: TinyStories + Cosmopedia + FineWeb-Edu (6:2:1)
  • Tokenizer: BPE, vocab 8192

Model

  • n_layers=16
  • n_heads=8
  • n_embd=768
  • block_size=512

Training Result

  • Best validation loss: 1.3427 at step 59000
  • Final validation loss: 1.4286

Why This Matters

This run slightly beat the previous best broadened checkpoint while using a meaningfully larger transformer. That makes it the new leading candidate for future broadening and conversational shaping stages.

Known Limitations

  • Outputs can still drift into templated educational or expository language.
  • This remains an experimental local model, not a production assistant.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support