Qwen3.6-Optimized
Collection
3 items • Updated • 1
NVFP4 quantized version of Qwen/Qwen3.6-35B-A3B. Produced and maintained by vrfai.
Following the Qwen3.6 series, this model prioritizes stability and real-world utility, offering upgrades in Agentic Coding and Thinking Preservation.
This model was quantized using NVIDIA ModelOpt (v0.39.0) with the NVFP4 algorithm. The configuration applies 4-bit float quantization to both weights and activations using a block size of 16.
| Property | Value |
|---|---|
| Base model | Qwen/Qwen3.6-35B-A3B |
| Quant method | NVIDIA ModelOpt (NVFP4) |
| Weights | 4-bit float (group_size: 16) |
| Input activation | 4-bit float (group_size: 16) |
| Excluded layers | lm_head, conv1d, shared_expert_gate |
You can deploy this model efficiently using SGLang with the modelopt_fp4 quantization backend.
Ensure you have SGLang installed. Launch the server using the following command:
sglang serve \
--model-path vrfai/Qwen3.6-35B-A3B-NVFP4 \
--reasoning-parser qwen3 \
--tensor-parallel-size 1 \
--tool-call-parser qwen3_coder \
--trust-remote-code \
--quantization modelopt_fp4
Base model
Qwen/Qwen3.6-35B-A3B