LH-Tech-AI commited on
Commit
efe8b1a
·
verified ·
1 Parent(s): 537b7d7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -0
README.md CHANGED
@@ -19,6 +19,19 @@ tags:
19
  # 🦅 Supra Mini 0.1M
20
  Supra Mini 0.1M is a very tiny base model trained on 500 million tokens of Fineweb-Edu for 2 epochs to prove how well very tiny models can perform on world knowledge.
21
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
  ## Benchmarks
23
 
24
  All benchmarks were executed using `lm-eval`.
 
19
  # 🦅 Supra Mini 0.1M
20
  Supra Mini 0.1M is a very tiny base model trained on 500 million tokens of Fineweb-Edu for 2 epochs to prove how well very tiny models can perform on world knowledge.
21
 
22
+ # Model Config
23
+
24
+ - Parameters: 117,648 (0.1M)
25
+ - Architecture: Llama
26
+ - Vocab size with custom BPE tokenizer: 250
27
+ - Hidden Size: 48
28
+ - Intermediate Size: 96
29
+ - Hidden Layers: 4
30
+ - Attention Heads: 4
31
+ - Max Position Embeddings: 256
32
+ - Learning rate: 6e-4
33
+ - Weight Decay: 0.01
34
+
35
  ## Benchmarks
36
 
37
  All benchmarks were executed using `lm-eval`.