Qwen3.6-35B-A3B-NVFP4

NVFP4 quantized version of Qwen/Qwen3.6-35B-A3B. Produced and maintained by vrfai.

Following the Qwen3.6 series, this model prioritizes stability and real-world utility, offering upgrades in Agentic Coding and Thinking Preservation.

Quantization Details

This model was quantized using NVIDIA ModelOpt (v0.39.0) with the NVFP4 algorithm. The configuration applies 4-bit float quantization to both weights and activations using a block size of 16.

Property Value
Base model Qwen/Qwen3.6-35B-A3B
Quant method NVIDIA ModelOpt (NVFP4)
Weights 4-bit float (group_size: 16)
Input activation 4-bit float (group_size: 16)
Excluded layers lm_head, conv1d, shared_expert_gate

Quickstart

You can deploy this model efficiently using SGLang with the modelopt_fp4 quantization backend.

Serving with SGLang

Ensure you have SGLang installed. Launch the server using the following command:

sglang serve \
  --model-path vrfai/Qwen3.6-35B-A3B-NVFP4 \
  --reasoning-parser qwen3 \
  --tensor-parallel-size 1 \
  --tool-call-parser qwen3_coder \
  --trust-remote-code \
  --quantization modelopt_fp4
Downloads last month
439
Safetensors
Model size
34B params
Tensor type
BF16
·
F8_E4M3
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for vrfai/Qwen3.6-35B-A3B-NVFP4

Quantized
(299)
this model

Collection including vrfai/Qwen3.6-35B-A3B-NVFP4