tool call leaks
#27 opened 1 day ago
by
kristianpaul
--reasoning-config breaks Nemotron v3 reasoning parser (content always null, thinking unbounded)
1
#23 opened 16 days ago
by
rhxsec
"This will lead to incorrect tokenization" warning
2
#22 opened 17 days ago
by
DanTup
Jetson Thor Official Container for vLLM 0.16 fails to load nemotron-3-super -- says mixed-precision quant config is unsupported in vLLM 0.16 container
1
#20 opened 23 days ago
by
mrjbj
FP4 quantization for inference optimization
#19 opened 26 days ago
by
O96a
Spark not using NVFP4?
1
#18 opened 28 days ago
by
D-Lynch
VLLM + MTP + NVFP4 doesn't work
👀 1
2
#16 opened about 1 month ago
by
catplusplus
Searching for a new Tool Parser
3
#15 opened about 1 month ago
by
LucasMM14
Run on DGX Spark
16
#14 opened about 1 month ago
by
LimeemiL
All this talk about NVFP4 - why is it dog slow?
14
#13 opened about 1 month ago
by
josephbreda
NVFP4 cannot be loaded in SGLang
4
#12 opened about 1 month ago
by
mratsim
vLLM MTP unusable on RTX 6000 Pro, as spec decoding consumes 20GB+ VRAM at start-up, causing OOM
5
#9 opened about 1 month ago
by
lsmc
Doesn't work with latest vllm, even tried to recompile vLLM and transformers from git
➕ 1
4
#8 opened about 1 month ago
by
catplusplus
RTX Pro 6000 support
3
#7 opened about 1 month ago
by
justinjja
CUDA Version -- Min requirement?
👀🤗 2
1
#6 opened about 1 month ago
by
raymondlo84-nvidia