tiny-gpt-lab-v0.3-bigger-16x768

Summary

This is the current strongest checkpoint in the tiny GPT home-lab project.

It keeps the proven light FineWeb broadening blend, but scales the transformer up to 16 layers and 768 embedding size. The goal of this stage was to test whether extra model capacity would translate into a real quality jump before pushing into broader data again.

Source

Source run: runs/tinystories-cosmopedia-fineweb-light-vocab-8192-16x768-gpu
Dataset blend: TinyStories + Cosmopedia + FineWeb-Edu (6:2:1)
Tokenizer: BPE, vocab 8192

Model

n_layers=16
n_heads=8
n_embd=768
block_size=512

Training Result

Best validation loss: 1.3427 at step 59000
Final validation loss: 1.4286

Why This Matters

This run slightly beat the previous best broadened checkpoint while using a meaningfully larger transformer. That makes it the new leading candidate for future broadening and conversational shaping stages.

Known Limitations

Outputs can still drift into templated educational or expository language.
This remains an experimental local model, not a production assistant.

Downloads last month: -; Downloads are not tracked for this model. How to track