YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

SpiderPortal v5

Recurrent Depth Transformer with MLA attention, Engram memory, and MoE.

Architecture

  • Dense: 250M params โ€” 2 prelude + 6 recurrent + 2 coda
  • MoE: 5.3B params โ€” 32 experts, top-2, 1 shared expert/layer
  • MLA (DeepSeek-V2 style, 10.7x KV compression)
  • Engram memory @ layers 1,4
  • LTI + ACT + LoRA

Training

Dense

MICRO_BATCH=42 SEQ_LEN=2048 TARGET_TOKENS=12400000000 python mythos-fineweb-dense.py

MoE (from dense checkpoint)

MICRO_BATCH=28 SEQ_LEN=2048 TARGET_TOKENS=12400000000 TRITON_COMPILE=1 DENSE_CKPT=... python mythos-fineweb-moe.py

Dataset

Tokenized FineWeb-Edu sample-10BT โ€” raw uint32 LE tokens

  • train_tokens.bin: 7.7B tokens, 29GB
  • metadata.json

Current Training (1B MoE)

Config: 16 experts | top-1 routing | intermediate=1024 | 6 layers | n_loops=1 Params: 997M (18% Engram / 82% MoE) VRAM: 43GB | Throughput: 40K tok/s

Run

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support