CLIWorks
/

spiderportal-v5

Model card Files Files and versions

spiderportal-v5 / README.md

CLIWorks's picture

Upload README.md with huggingface_hub

05ea558 verified 21 days ago

|

history blame contribute delete

913 Bytes

	# SpiderPortal v5

	Recurrent Depth Transformer with MLA attention, Engram memory, and MoE.

	## Architecture
	- Dense: 250M params — 2 prelude + 6 recurrent + 2 coda
	- MoE: 5.3B params — 32 experts, top-2, 1 shared expert/layer
	- MLA (DeepSeek-V2 style, 10.7x KV compression)
	- Engram memory @ layers 1,4
	- LTI + ACT + LoRA

	## Training

	### Dense
	```
	MICRO_BATCH=42 SEQ_LEN=2048 TARGET_TOKENS=12400000000 python mythos-fineweb-dense.py
	```

	### MoE (from dense checkpoint)
	```
	MICRO_BATCH=28 SEQ_LEN=2048 TARGET_TOKENS=12400000000 TRITON_COMPILE=1 DENSE_CKPT=... python mythos-fineweb-moe.py
	```

	## Dataset
	Tokenized FineWeb-Edu sample-10BT — raw uint32 LE tokens
	- train_tokens.bin: 7.7B tokens, 29GB
	- metadata.json
	## Current Training (1B MoE)

	Config: 16 experts \| top-1 routing \| intermediate=1024 \| 6 layers \| n_loops=1
	Params: 997M (18% Engram / 82% MoE)
	VRAM: 43GB \| Throughput: 40K tok/s

	### Run