This is a MXFP4 quantization of the model Nemotron-Cascade-14B-Thinking
This uses RoPE scaling with the YaRN method, in order to extend the original small 32k context size up to 128k.
Yes, this is a dense model, yes I use MXFP4 despite of this.
It's an experiment to see if MXFP4 quantization will be useable on these kind of models.
Furthermore, the model's tensors are also quantized differently depending on the importance matrix from bartowski.
In order of importance: BF16 > Q8_0 > MXFP4
Quantization distribution
BF16 : 6 tensors ( 20.0% of size)
Q8_0 : 39 tensors ( 21.1% of size)
MXFP4 : 237 tensors ( 58.9% of size)
- Downloads last month
- 29
Hardware compatibility
Log In to add your hardware
We're not able to determine the quantization variants.
Model tree for noctrex/Nemotron-Cascade-14B-Thinking-MXFP4-GGUF
Base model
nvidia/Nemotron-Cascade-14B-Thinking
