--- library_name: mlx license: apache-2.0 base_model: Qwen/Qwen3.6-27B base_model_relation: quantized tags: - mlx - oQe - enhanced - mixed-precision - hybrid-attention - agentic track_downloads: true --- # Qwen3.6-27B — oQe Series This repository contains **Enhanced (oQe)** MLX quants for **Qwen3.6-27B**. While standard quants are often made in one "streaming" pass, the **oQe** series uses a multi-stage optimization path to make sure the model doesn't lose its "intelligence" at lower bitrates. ### 🚀 The oQe (Enhanced) Build Path We move away from simple rounding and use a more deliberate process to protect the model's logic: 1. **Sensitivity Mapping:** We don't guess which layers are important. We run a calibration pass to measure exactly how much precision each layer needs to keep its output stable. 2. **Hessian-Based Tuning:** Every time we round a weight to a lower bit, we adjust the surrounding weights to compensate for the error. This keeps the model's internal math from "drifting" as it gets smaller. 3. **Unified Batching:** We use layer-wide batching to keep the dense architecture's weight distribution consistent with the original weights. ### 📋 oQ Build Performance Matrix | Tier | Target bpw | Actual bpw | Size | Precision Boosts | Hybrid Plan / Strategy | |---|---|---|---|---|---| | [**oQ8e**](https://huggingface.co/splats/Qwen3.6-27B-oQ8e) | 8.0 | **8.00** | 27.5 GB | 0 | Full 8-bit Static | | [**oQ6e**](https://huggingface.co/splats/Qwen3.6-27B-oQ6e) | 6.0 | **6.64** | 21.5 GB | 1 | 8bit×1 | | [**oQ5e**](https://huggingface.co/splats/Qwen3.6-27B-oQ5e) | 5.0 | **5.70** | 18.6 GB | 18 | 8bit×1, 6bit×17 | | [**oQ4e**](https://huggingface.co/splats/Qwen3.6-27B-oQ4e) | 4.0 | **4.70** | 15.4 GB | 46 | 6bit×26, 5bit×20 | | [**oQ3.5e**](https://huggingface.co/splats/Qwen3.6-27B-oQ3.5e) | 3.5 | **4.00** | 13.2 GB | 44 | 8bit×1, 6bit×25, 5bit×18 | ### 🛠 Technical Build Audit * **Calibration:** Uses a 600-sample dataset across code, reasoning, and multi-turn conversations. * **Sensitivity Proxy:** Qwen3.6-27B-oQ8e (Internal Baseline). * **Optimization Floor:** We always lock the `lm_head` and critical early blocks at 8-bit to ensure the model's basic "senses" remain intact. ## Model Highlights * **Thinking Preservation:** Specifically tuned to maintain logic-chain stability in the 64-layer hybrid architecture. * **Agentic Coding:** Native proficiency in repository-level reasoning and frontend generation. --- **Acknowledgments:** These quants were built using the [oMLX](https://github.com/jundot/omlx) framework. The weight optimization process is based on the **GPTQ** algorithm by Frantar et al. *Verified via Splats Lab Vault v2.8. These models are standard mlx-lm compatible and work with any app supporting MLX safetensors.*