Nemotron-3-Nano-30B-A3B

GGUF quant of https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 with full precision applied to attention & ssm related layers for better performance.

I spent soooo much time quantizing this model. I was unsure if it is that broken or I do something wrong. Looks like all Mamba related layers should be in full precision and model have some problems with halucinations and tool colling what is likely result of not good enought datasets used for trainings and overtraining on select topics. This is the only model capable of tool colling that have problems to choose opened (viwed) file in RooCode test. It can inject facts from previous conversation to last reply if it is about simmilar topic, so there are some context understanding related issues also. Regarding halucynations, give it text with Donald Trump. Is he previous president? Is Karol Nawrocki Andrzej Duda? To be fair, this is first good model from Nvidia but still misses here and there.

Maybe I could make this quant even better but I already spent sooooo much time on it and wanted to keep it bellow 30GB. I give up, this one is quite ok and better then other quants I tested.

Downloads last month
23
GGUF
Model size
32B params
Architecture
nemotron_h_moe
Hardware compatibility
Log In to add your hardware
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for altomek/Nemotron-3-Nano-30B-A3B-XXL-GGUF

Quantized
(37)
this model

Datasets used to train altomek/Nemotron-3-Nano-30B-A3B-XXL-GGUF

Collection including altomek/Nemotron-3-Nano-30B-A3B-XXL-GGUF