These are quantizations of the model Nemotron-Cascade-2-30B-A3B

  • Download the latest llama.cpp to use these quantizations.

The mainline standard is to use MXFP4 for the MoE tensors, and Q8 for the rest.
So I created 2 new variants, where the other tensors are either BF16 or FP16 instead of Q8.
The order of preference is BF16, then F16.
On some architectures BF16 will be slower, but its the highest quality, essentialy its the original tensors from the model copied over unquantized.

Downloads last month
1,068
GGUF
Model size
32B params
Architecture
nemotron_h_moe
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for noctrex/Nemotron-Cascade-2-30B-A3B-MXFP4_MOE-GGUF

Quantized
(31)
this model