File size: 913 Bytes
2d6303e 6805d16 2d6303e 6805d16 2d6303e 6805d16 2d6303e 6805d16 2d6303e 6805d16 2d6303e 6805d16 05ea558 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | # SpiderPortal v5
Recurrent Depth Transformer with MLA attention, Engram memory, and MoE.
## Architecture
- Dense: 250M params — 2 prelude + 6 recurrent + 2 coda
- MoE: 5.3B params — 32 experts, top-2, 1 shared expert/layer
- MLA (DeepSeek-V2 style, 10.7x KV compression)
- Engram memory @ layers 1,4
- LTI + ACT + LoRA
## Training
### Dense
```
MICRO_BATCH=42 SEQ_LEN=2048 TARGET_TOKENS=12400000000 python mythos-fineweb-dense.py
```
### MoE (from dense checkpoint)
```
MICRO_BATCH=28 SEQ_LEN=2048 TARGET_TOKENS=12400000000 TRITON_COMPILE=1 DENSE_CKPT=... python mythos-fineweb-moe.py
```
## Dataset
Tokenized FineWeb-Edu sample-10BT — raw uint32 LE tokens
- train_tokens.bin: 7.7B tokens, 29GB
- metadata.json
## Current Training (1B MoE)
Config: 16 experts | top-1 routing | intermediate=1024 | 6 layers | n_loops=1
Params: 997M (18% Engram / 82% MoE)
VRAM: 43GB | Throughput: 40K tok/s
### Run
|