based-distill56l-dclm10b-s4096-step9-mamba_hy-2.9b-A0-hd64-ng6-msdim320-me1-bs1024-gpus8-sl4096

This is a model uploaded from /mnt/nanjingcephfs/project_wx-rec-alg-bdc-exp/bwzheng/yulan/hyw/pretrain-linear-moe-dev/megatron_lm_workspace/checkpoint/based-distill56l-dclm10b-s4096-step9-mamba_hybrid-2.9b-112layers-q30-kv6-hybrid0.0625-pattern_A0-mheaddim64-mnumgroups6-mstatedim320-mexpand1-freeze_false-ep1-mp2-pp1-cp1-lr2e-5-minlr7e-7-bs1024-gpus8-seqlen4096-loadyulan_attn_mamba.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support