Qwen3.5-27B-MXFP4

MXFP4 (E2M1) weight-only quantization of Qwen/Qwen3.5-27B.

Quantization Details

  • Format: MXFP4 (E2M1 float4), packed 2 values per uint8 byte
  • Scales: E8M0 per group of 32 elements
  • Method: 3-candidate MSE-optimal quantization (per-block best-fit)
  • Config: compressed-tensors with mxfp4-pack-quantized format

Layers NOT quantized

Attention, embeddings, LM head, gates, linear attention, and vision modules are kept in original precision:

re:.*self_attn.*, re:.*.mlp.gate$, re:.*lm_head.*,
re:.*embed_tokens.*, re:.*linear_attn.*, re:.*shared_expert_gate.*,
re:.*visual.*, re:.*mtp.*

Evaluation

Metric Value
Perplexity (WikiText-2) 6.87
Context max_len=2048, stride=512
Tokens scored 297,471

Usage with vLLM

from vllm import LLM, SamplingParams

llm = LLM(model="olka-fi/Qwen3.5-27B-MXFP4")
output = llm.generate("Hello, world!", SamplingParams(max_tokens=128))
print(output[0].outputs[0].text)

Quantization Tool

Quantized with qstream -- MXFP4 quantization toolkit for vLLM-compatible models.

Acknowledgments

Base model by Qwen Team.

Downloads last month
88
Safetensors
Model size
20B params
Tensor type
BF16
·
U8
·
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for olka-fi/Qwen3.5-27B-MXFP4

Base model

Qwen/Qwen3.5-27B
Quantized
(202)
this model