CLIWorks
/

spiderportal-v5

Model card Files Files and versions

xet

Community

CLIWorks commited on 22 days ago

Commit

2d6303e

verified ·

1 Parent(s): c35e255

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +58 -0

README.md ADDED Viewed

	@@ -0,0 +1,58 @@

+# SpiderPortal v5
+Recurrent Depth Transformer with MLA attention, Engram conditional memory, and MoE.
+## Architecture
+- **Dense**: 250M params — 2 prelude + 6 recurrent + 2 coda layers
+- **MoE**: 5.3B params — 32 experts, top-2 routing, 1 shared expert per layer
+- **MLA**: Multi-Latent Attention (DeepSeek-V2 style, 10.7x KV cache compression)
+- **Engram**: Conditional memory at layers 1,4 (hash-based ngram lookup + conv1d gate)
+- **LTI Injection** + **ACT Halting** + **LoRA Adapter**
+- 32k context (extendable to 256k via YaRN)
+## Training
+### Dense (Phase 1)
+```bash
+env MICRO_BATCH=42 SEQ_LEN=2048 TARGET_TOKENS=12400000000 CKPT_EVERY=5000 \
+    python mythos-fineweb-dense.py
+```
+### MoE (Phase 2, from dense checkpoint)
+```bash
+env MICRO_BATCH=28 SEQ_LEN=2048 TARGET_TOKENS=12400000000 CKPT_EVERY=5000 \
+    TRITON_COMPILE=1 DENSE_CKPT=checkpoints-dense/spiderportal-v5-dense-ep1-step5000.pt \
+    python mythos-fineweb-moe.py
+```
+### MoE (from scratch)
+```bash
+env MICRO_BATCH=28 SEQ_LEN=2048 TARGET_TOKENS=12400000000 CKPT_EVERY=5000 \
+    TRITON_COMPILE=1 \
+    python mythos-fineweb-moe.py
+```
+## VRAM Usage
+| Config | Batch | VRAM | Tok/s |
+|--------|:-----:|:----:|:-----:|
+| Dense bf16 | 44 | 48.7GB | 42K |
+| Dense MXFP8 | 42 | 46.6GB | 40K |
+| MoE bf16 + compile | 28 | 40.6GB | 27K |
+## Dataset
+Tokenized FineWeb-Edu sample-10BT. Format: raw uint32 little-endian tokens.
+- `data/train_tokens.bin` — 7.7B tokens, 29GB
+- `data/metadata.json` — tokenization metadata
+## Requirements
+- Python 3.10+
+- PyTorch 2.x with CUDA 12.0+
+- `torchtitan` (for MoE routing/experts)
+- `torchao` (optional, for MXFP8)
+- `transformers`, `datasets`, `loguru`
+- `triton`, `numba` (for custom kernels)