Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-ModelOpt-NVFP4

This repository contains a ModelOpt-quantized checkpoint derived from Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled.

Quantization Summary

  • Quantization tool: NVIDIA TensorRT Model Optimizer
  • Weight quantization: NVFP4
  • KV cache quantization: FP8
  • Calibration size: 1024 samples
  • Calibration sequence length: 4096
  • Calibration batch size: 1
  • Calibration source mix:
    • nohurry/Opus-4.6-Reasoning-3000x-filtered: 596 samples
    • Jackrong/Qwen3.5-reasoning-700x: 178 samples
    • TeichAI/claude-4.5-opus-high-reasoning-250x: 250 samples

The calibration set was converted into a single JSONL file with the source model's chat template applied before PTQ, so the activation distribution is closer to the reasoning format used by this distilled checkpoint.

Runtime Notes

  • Intended runtime target: SGLang with ModelOpt-compatible HF checkpoint loading
  • Quantization format: ModelOpt HF export, not compressed-tensors
  • A few unsupported or intentionally skipped modules may remain excluded by ModelOpt during export; see hf_quant_config.json for the final exclusion list

Files

  • model.safetensors: quantized weights
  • hf_quant_config.json: final quantization metadata
  • tokenizer and processor files inherited from the source checkpoint
Downloads last month
1,268
Safetensors
Model size
17B params
Tensor type
BF16
F8_E4M3
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for rhoninseiei/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-ModelOpt-NVFP4