Nemotron-3-Nano-30B-A3B

GGUF quant of https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 with full precision applied to attention & ssm related layers for better performance.

I spent soooo much time quantizing this model. I was unsure if it is that broken or I do something wrong. Looks like all Mamba related layers should be in full precision and model have some problems with halucinations and tool colling what is likely result of not good enought datasets used for trainings and overtraining on select topics. This is the only model capable of tool colling that have problems to choose opened (viwed) file in RooCode test. It can inject facts from previous conversation to last reply if it is about simmilar topic, so there are some context understanding related issues also. Regarding halucynations, give it text with Donald Trump. Is he previous president? Is Karol Nawrocki Andrzej Duda? To be fair, this is first good model from Nvidia but still misses here and there.

Maybe I could make this quant even better but I already spent sooooo much time on it and wanted to keep it bellow 30GB. I give up, this one is quite ok and better then other quants I tested.

Downloads last month: 23

GGUF

Model size

32B params

Architecture

nemotron_h_moe

Hardware compatibility

Model tree for altomek/Nemotron-3-Nano-30B-A3B-XXL-GGUF

Base model

nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16

Quantized

(37)

this model

Datasets used to train altomek/Nemotron-3-Nano-30B-A3B-XXL-GGUF

Collection including altomek/Nemotron-3-Nano-30B-A3B-XXL-GGUF

XXL GGUF

Collection

Inspired by Unsloth XL quants, I started mastering my own GGUF quants with gool to make them even better then XL ones. • 13 items • Updated 5 days ago