Text Generation
Transformers
Safetensors
PyTorch
nemotron_h
nvidia
nemotron-3
latent-moe
mtp
conversational
custom_code
8-bit precision
modelopt

Searching for a new Tool Parser

#15
by LucasMM14 - opened

Does anyone know if there's another Tool Parser for this model?

It's recommended to use "--tool-call-parser qwen3_coder", but I'm unable to use it at this moment.

Why are you unable to use it?

this is the command that I use and it works perfectly...

vllm serve /media/data/models/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 \
  --async-scheduling \
  --served-model-name jarvis-thinker \
  --dtype auto \
  --kv-cache-dtype fp8 \
  --tensor-parallel-size 2 \
  --swap-space 0 \
  --trust-remote-code \
  --gpu-memory-utilization 0.75 \
  --max-model-len 262144 \
  --enable-chunked-prefill \
  --max-num-seqs 512 \
  --host 0.0.0.0 \
  --port 10002 \
  --enable-auto-tool-choice \
  --tool-call-parser qwen3_coder \
  --reasoning-parser nemotron_v3
  --enforce-eager \
  --max-cudagraph-capture-size 128 \
  --enable-chunked-prefill \
  --mamba-ssm-cache-dtype float16

Hi mtcl,

There's a new policy prohibiting the use of sovereign models (and complements) in generative AI (GenAI)

I don't think I understand that comment ...

Sign up or log in to comment