LH-Tech-AI commited on
Commit
ff7c5fb
·
verified ·
1 Parent(s): ba1bd8f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -0
README.md CHANGED
@@ -63,6 +63,9 @@ We trained Supra Mini 0.1M on a single T4 GPU in ~45 minutes for 2 epochs.<br>
63
  The full training code can be found in this repo as `run.sh` (easily run the complete pipeline), `train_tokenizer.py` (train costum BPE tokenizer with vocab size of 250), `train.py` (train the model) and `inference.py` (test the model).<br>
64
  The model was trained on the first 500 million tokens of Sample-10BT from Fineweb-Edu using streaming tokenization.
65
 
 
 
 
66
  ## Final thoughts
67
  As the new founded organization **SupraLabs**, we are proud the introduce our first Tiny-LLM to prove that our pipeline is running.<br>
68
  More models will release soon...
 
63
  The full training code can be found in this repo as `run.sh` (easily run the complete pipeline), `train_tokenizer.py` (train costum BPE tokenizer with vocab size of 250), `train.py` (train the model) and `inference.py` (test the model).<br>
64
  The model was trained on the first 500 million tokens of Sample-10BT from Fineweb-Edu using streaming tokenization.
65
 
66
+ ## Overtraining
67
+ Yes, this model is heavily overtrained! With about ~212x more data than needed (20 tokens per parameter is chinchilla-optimum - we used ~4250).
68
+
69
  ## Final thoughts
70
  As the new founded organization **SupraLabs**, we are proud the introduce our first Tiny-LLM to prove that our pipeline is running.<br>
71
  More models will release soon...