caiovicentino1/Nemotron-Cascade-2-30B-A3B-HLWQ-Q5
Text Generation • 20B • Updated • 2.64k • 7
30B MoE · 7.6 GB VRAM · 15 tok/s on RTX 4090 · expert offloading + HLWQ Q5
Note PolarQuant Q5 + Expert Offloading (7.6 GB, 15 tok/s)
Note Base model (30B MoE, hybrid Mamba+MoE+Attention)