Qwen3.5-27B-MXFP4
MXFP4 (E2M1) weight-only quantization of Qwen/Qwen3.5-27B.
Quantization Details
- Format: MXFP4 (E2M1 float4), packed 2 values per uint8 byte
- Scales: E8M0 per group of 32 elements
- Method: 3-candidate MSE-optimal quantization (per-block best-fit)
- Config:
compressed-tensorswithmxfp4-pack-quantizedformat
Layers NOT quantized
Attention, embeddings, LM head, gates, linear attention, and vision modules are kept in original precision:
re:.*self_attn.*, re:.*.mlp.gate$, re:.*lm_head.*,
re:.*embed_tokens.*, re:.*linear_attn.*, re:.*shared_expert_gate.*,
re:.*visual.*, re:.*mtp.*
Evaluation
| Metric | Value |
|---|---|
| Perplexity (WikiText-2) | 6.87 |
| Context | max_len=2048, stride=512 |
| Tokens scored | 297,471 |
Usage with vLLM
from vllm import LLM, SamplingParams
llm = LLM(model="olka-fi/Qwen3.5-27B-MXFP4")
output = llm.generate("Hello, world!", SamplingParams(max_tokens=128))
print(output[0].outputs[0].text)
Quantization Tool
Quantized with qstream -- MXFP4 quantization toolkit for vLLM-compatible models.
Acknowledgments
Base model by Qwen Team.
- Downloads last month
- 88
Model tree for olka-fi/Qwen3.5-27B-MXFP4
Base model
Qwen/Qwen3.5-27B