tiny-gpt-lab-v0.3-bigger-16x768
Summary
This is the current strongest checkpoint in the tiny GPT home-lab project.
It keeps the proven light FineWeb broadening blend, but scales the transformer up to 16 layers and 768 embedding size. The goal of this stage was to test whether extra model capacity would translate into a real quality jump before pushing into broader data again.
Source
- Source run:
runs/tinystories-cosmopedia-fineweb-light-vocab-8192-16x768-gpu - Dataset blend: TinyStories + Cosmopedia + FineWeb-Edu (
6:2:1) - Tokenizer: BPE, vocab
8192
Model
n_layers=16n_heads=8n_embd=768block_size=512
Training Result
- Best validation loss:
1.3427at step59000 - Final validation loss:
1.4286
Why This Matters
This run slightly beat the previous best broadened checkpoint while using a meaningfully larger transformer. That makes it the new leading candidate for future broadening and conversational shaping stages.
Known Limitations
- Outputs can still drift into templated educational or expository language.
- This remains an experimental local model, not a production assistant.