kshitijthakkar/deepseek-v4-mini-300M-init
Text Generation • 0.3B • Updated • 18
Small-scale faithful replicas of the DeepSeek-V4 architecture for ablation and weight-transfer research.
Note Randomly-initialized 317M / 170M-active scaffold; reference architecture for V4 weight-transfer experiments.
Note Random-init 1.02B / 0.46B-active DeepSeek-V4 architecture scaffold.
Note Random-init 3.17B / 1.10B-active DeepSeek-V4 architecture scaffold.
Note Random-init 8.06B / 2.21B-active DeepSeek-V4 architecture scaffold.
Note Sliced from V4-Flash. frozen_keys.json lists tensors copied verbatim.
Note Partially sliced from deepseek-ai/DeepSeek-V4-Flash; 16 of 27 chosen-layer shards applied. Re-run on Linux GPU for a complete slice.