Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -25,3 +25,11 @@ MICRO_BATCH=28 SEQ_LEN=2048 TARGET_TOKENS=12400000000 TRITON_COMPILE=1 DENSE_CKP
|
|
| 25 |
Tokenized FineWeb-Edu sample-10BT — raw uint32 LE tokens
|
| 26 |
- train_tokens.bin: 7.7B tokens, 29GB
|
| 27 |
- metadata.json
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
Tokenized FineWeb-Edu sample-10BT — raw uint32 LE tokens
|
| 26 |
- train_tokens.bin: 7.7B tokens, 29GB
|
| 27 |
- metadata.json
|
| 28 |
+
## Current Training (1B MoE)
|
| 29 |
+
|
| 30 |
+
Config: 16 experts | top-1 routing | intermediate=1024 | 6 layers | n_loops=1
|
| 31 |
+
Params: 997M (18% Engram / 82% MoE)
|
| 32 |
+
VRAM: 43GB | Throughput: 40K tok/s
|
| 33 |
+
|
| 34 |
+
### Run
|
| 35 |
+
|