metadata
library_name: mlx
license: apache-2.0
base_model: Qwen/Qwen3.6-27B
base_model_relation: quantized
tags:
- mlx
- oQe
- enhanced
- mixed-precision
- hybrid-attention
- agentic
track_downloads: true
Qwen3.6-27B — oQe Series
This repository contains Enhanced (oQe) MLX quants for Qwen3.6-27B. While standard quants are often made in one "streaming" pass, the oQe series uses a multi-stage optimization path to make sure the model doesn't lose its "intelligence" at lower bitrates.
🚀 The oQe (Enhanced) Build Path
We move away from simple rounding and use a more deliberate process to protect the model's logic:
- Sensitivity Mapping: We don't guess which layers are important. We run a calibration pass to measure exactly how much precision each layer needs to keep its output stable.
- Hessian-Based Tuning: Every time we round a weight to a lower bit, we adjust the surrounding weights to compensate for the error. This keeps the model's internal math from "drifting" as it gets smaller.
- Unified Batching: We use layer-wide batching to keep the dense architecture's weight distribution consistent with the original weights.
📋 oQ Build Performance Matrix
| Tier | Target bpw | Actual bpw | Size | Precision Boosts | Hybrid Plan / Strategy |
|---|---|---|---|---|---|
| oQ8e | 8.0 | 8.00 | 27.5 GB | 0 | Full 8-bit Static |
| oQ6e | 6.0 | 6.64 | 21.5 GB | 1 | 8bit×1 |
| oQ5e | 5.0 | 5.70 | 18.6 GB | 18 | 8bit×1, 6bit×17 |
| oQ4e | 4.0 | 4.70 | 15.4 GB | 46 | 6bit×26, 5bit×20 |
| oQ3.5e | 3.5 | 4.00 | 13.2 GB | 44 | 8bit×1, 6bit×25, 5bit×18 |
🛠 Technical Build Audit
- Calibration: Uses a 600-sample dataset across code, reasoning, and multi-turn conversations.
- Sensitivity Proxy: Qwen3.6-27B-oQ8e (Internal Baseline).
- Optimization Floor: We always lock the
lm_headand critical early blocks at 8-bit to ensure the model's basic "senses" remain intact.
Model Highlights
- Thinking Preservation: Specifically tuned to maintain logic-chain stability in the 64-layer hybrid architecture.
- Agentic Coding: Native proficiency in repository-level reasoning and frontend generation.
Acknowledgments: These quants were built using the oMLX framework. The weight optimization process is based on the GPTQ algorithm by Frantar et al.
Verified via Splats Lab Vault v2.8. These models are standard mlx-lm compatible and work with any app supporting MLX safetensors.