Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-ModelOpt-NVFP4
This repository contains a ModelOpt-quantized checkpoint derived from
Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled.
Quantization Summary
- Quantization tool: NVIDIA TensorRT Model Optimizer
- Weight quantization: NVFP4
- KV cache quantization: FP8
- Calibration size: 1024 samples
- Calibration sequence length: 4096
- Calibration batch size: 1
- Calibration source mix:
nohurry/Opus-4.6-Reasoning-3000x-filtered: 596 samplesJackrong/Qwen3.5-reasoning-700x: 178 samplesTeichAI/claude-4.5-opus-high-reasoning-250x: 250 samples
The calibration set was converted into a single JSONL file with the source model's chat template applied before PTQ, so the activation distribution is closer to the reasoning format used by this distilled checkpoint.
Runtime Notes
- Intended runtime target: SGLang with ModelOpt-compatible HF checkpoint loading
- Quantization format: ModelOpt HF export, not
compressed-tensors - A few unsupported or intentionally skipped modules may remain excluded by ModelOpt
during export; see
hf_quant_config.jsonfor the final exclusion list
Files
model.safetensors: quantized weightshf_quant_config.json: final quantization metadata- tokenizer and processor files inherited from the source checkpoint
- Downloads last month
- 1,268
Model tree for rhoninseiei/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-ModelOpt-NVFP4
Base model
Qwen/Qwen3.5-27B