This model is a gem for agentic work flows.

by phakio - opened Mar 12

•

Recently switched from the initial unsloth quants to yours, as I wanted to squeeze my system and switch from my initial IQ4_XS to Q4_K_M.

I just wanted to make a little post for people that might want to check out this model, it has performed greatly for my personal use. It's able to correctly use skills via agentic scripts like nanobot, and even in opencode for local, lightweight code creation and analysis / debugging, it was able to correctly solve issues in python and even xcode.

This models responses are more direct, and to the point... I see this as a plus in most cases, unless you're looking for a more human feeling agentic companion. Compared to the qwen3.5 series, which seemed to reason too much, and in the final output even output too much "slop" around the requested response.

My setup allows for full gpu offload, 96gb vram 1x4090 and 3x3090. The model generates at a rate of around 50t/s, which is fine. I also appriciate that this model doesn't use 100% of my gpu power when generating, reducing the noise and heat output of the rig, compared to something like vLLM or another more aggressive backend.

Overall I'm pleased with this model, and will use it for the time being until the next new thing comes along.

Sidenote: to fix tool calling within nanobot to play nice with thinking, I had to append the two special end tokens to my llama.cpp launch script... I'll paste it below in case it helps anyone else.

CUDA_VISIBLE_DEVICES=0,1,2,3 \
/home/phone/llama.cpp/build/bin/llama-server \
    --model /home/phone/Downloads/LocalModels/AesSedai/Nemotron_3_Super/NVIDIA-Nemotron-3-Super-120B-A12B-BF16-Q4_K_M-00001-of-00003.gguf \
    --alias AesSedai/NVIDIA-Nemotron-3-Super-120B-A12B-GGUF \
    --ctx-size 256000 \
    -ngl 99 \
    --host 0.0.0.0 \
    --port 8081 \
    --jinja \
    -r "<|im_end|>" \
    -r "</s>" \
    --special \
    --temp 0.6 \
    --top-p 0.95 \
    --min-p 0.00

AesSedai

Owner Mar 12

Thank you for the feedback! I just tweaked some of the quants late yesterday / today now that there's a fix for IQ4_NL in @bartowski 's llama.cpp PR. Glad that it's working well for you!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment