Qwen3.5-27B-MXFP4

MXFP4 (E2M1) weight-only quantization of Qwen/Qwen3.5-27B.

Quantization Details

Format: MXFP4 (E2M1 float4), packed 2 values per uint8 byte
Scales: E8M0 per group of 32 elements
Method: 3-candidate MSE-optimal quantization (per-block best-fit)
Config: compressed-tensors with mxfp4-pack-quantized format

Layers NOT quantized

Attention, embeddings, LM head, gates, linear attention, and vision modules are kept in original precision:

re:.*self_attn.*, re:.*.mlp.gate$, re:.*lm_head.*,
re:.*embed_tokens.*, re:.*linear_attn.*, re:.*shared_expert_gate.*,
re:.*visual.*, re:.*mtp.*

Evaluation

Metric	Value
Perplexity (WikiText-2)	6.87
Context	max_len=2048, stride=512
Tokens scored	297,471

Usage with vLLM

from vllm import LLM, SamplingParams

llm = LLM(model="olka-fi/Qwen3.5-27B-MXFP4")
output = llm.generate("Hello, world!", SamplingParams(max_tokens=128))
print(output[0].outputs[0].text)

Quantization Tool

Quantized with qstream -- MXFP4 quantization toolkit for vLLM-compatible models.

Acknowledgments

Base model by Qwen Team.

Downloads last month: 88

Safetensors

Model size

20B params

Tensor type

BF16

F32

Model tree for olka-fi/Qwen3.5-27B-MXFP4

Base model

Qwen/Qwen3.5-27B

Quantized

(202)

this model