nvidia
/

NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4

Text Generation

8-bit precision

Model card Files Files and versions

Resources

View closed (12)

tool call leaks

#27 opened 1 day ago by

--reasoning-config breaks Nemotron v3 reasoning parser (content always null, thinking unbounded)

#23 opened 16 days ago by

"This will lead to incorrect tokenization" warning

#22 opened 17 days ago by

Jetson Thor Official Container for vLLM 0.16 fails to load nemotron-3-super -- says mixed-precision quant config is unsupported in vLLM 0.16 container

#20 opened 23 days ago by

FP4 quantization for inference optimization

#19 opened 26 days ago by

Spark not using NVFP4?

#18 opened 28 days ago by

VLLM + MTP + NVFP4 doesn't work

#16 opened about 1 month ago by

Searching for a new Tool Parser

#15 opened about 1 month ago by

Run on DGX Spark

#14 opened about 1 month ago by

All this talk about NVFP4 - why is it dog slow?

#13 opened about 1 month ago by

NVFP4 cannot be loaded in SGLang

#12 opened about 1 month ago by

vLLM MTP unusable on RTX 6000 Pro, as spec decoding consumes 20GB+ VRAM at start-up, causing OOM

#9 opened about 1 month ago by

Doesn't work with latest vllm, even tried to recompile vLLM and transformers from git

#8 opened about 1 month ago by

RTX Pro 6000 support

#7 opened about 1 month ago by

CUDA Version -- Min requirement?

#6 opened about 1 month ago by

raymondlo84-nvidia