Qwen3.5-35B-A3B-Uncensored-Aggressive-NVFP4
NVFP4 quantized version of Li101/Qwen3.5-35B-A3B-Uncensored-Aggressive-safetensors, which is the "Aggressive" uncensoring variant of Qwen/Qwen3.5-35B-A3B. Includes full vision encoder weights (BF16, unquantized).
Quantization Details
| Detail | Value |
|---|---|
| Method | NVFP4 (compressed-tensors) |
| Model size | ~23 GB (vs 67 GB BF16) |
| Language model | NVFP4 quantized |
| Visual encoder | BF16 (unquantized, 333 tensors, 0.89 GB) |
| Excluded from quantization | lm_head, MoE gates, shared expert gates, linear attention layers, visual encoder |
Calibration
- Samples: 512 (256 from each dataset)
- Datasets:
- HuggingFaceH4/ultrachat_200k (
train_sftsplit) — length-filtered (longest conversations selected) - nvidia/Nemotron-Post-Training-Dataset-v2 (
chatsplit)
- HuggingFaceH4/ultrachat_200k (
- Max sequence length: 1024
- All experts calibrated:
moe_calibrate_all_experts=True
Usage with vLLM
Please use nightly vLLM for support.
vllm serve Li101/Qwen3.5-35B-A3B-Uncensored-Aggressive-NVFP4 \
--kv-cache-dtype fp8 \
--reasoning-parser qwen3 \
--enable-auto-tool-choice \
--tool-call-parser qwen3_coder
Specs
Same architecture and capabilities as Qwen/Qwen3.5-35B-A3B, including vision (image/video understanding).
Credits
- Downloads last month
- 664
Model tree for Li101/Qwen3.5-35B-A3B-Uncensored-Aggressive-NVFP4
Base model
Qwen/Qwen3.5-35B-A3B-Base Finetuned
Qwen/Qwen3.5-35B-A3B