CLIWorks commited on
Commit
05ea558
·
verified ·
1 Parent(s): 02f20fc

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +8 -0
README.md CHANGED
@@ -25,3 +25,11 @@ MICRO_BATCH=28 SEQ_LEN=2048 TARGET_TOKENS=12400000000 TRITON_COMPILE=1 DENSE_CKP
25
  Tokenized FineWeb-Edu sample-10BT — raw uint32 LE tokens
26
  - train_tokens.bin: 7.7B tokens, 29GB
27
  - metadata.json
 
 
 
 
 
 
 
 
 
25
  Tokenized FineWeb-Edu sample-10BT — raw uint32 LE tokens
26
  - train_tokens.bin: 7.7B tokens, 29GB
27
  - metadata.json
28
+ ## Current Training (1B MoE)
29
+
30
+ Config: 16 experts | top-1 routing | intermediate=1024 | 6 layers | n_loops=1
31
+ Params: 997M (18% Engram / 82% MoE)
32
+ VRAM: 43GB | Throughput: 40K tok/s
33
+
34
+ ### Run
35
+