Qwen3.6-27B-oQ4e / README.md
splats's picture
Update README.md
4b65b0d verified
---
library_name: mlx
license: apache-2.0
base_model: Qwen/Qwen3.6-27B
base_model_relation: quantized
tags:
- mlx
- oQe
- enhanced
- mixed-precision
- hybrid-attention
- agentic
track_downloads: true
---
# Qwen3.6-27B — oQe Series
This repository contains **Enhanced (oQe)** MLX quants for **Qwen3.6-27B**. While standard quants are often made in one "streaming" pass, the **oQe** series uses a multi-stage optimization path to make sure the model doesn't lose its "intelligence" at lower bitrates.
### 🚀 The oQe (Enhanced) Build Path
We move away from simple rounding and use a more deliberate process to protect the model's logic:
1. **Sensitivity Mapping:** We don't guess which layers are important. We run a calibration pass to measure exactly how much precision each layer needs to keep its output stable.
2. **Hessian-Based Tuning:** Every time we round a weight to a lower bit, we adjust the surrounding weights to compensate for the error. This keeps the model's internal math from "drifting" as it gets smaller.
3. **Unified Batching:** We use layer-wide batching to keep the dense architecture's weight distribution consistent with the original weights.
### 📋 oQ Build Performance Matrix
| Tier | Target bpw | Actual bpw | Size | Precision Boosts | Hybrid Plan / Strategy |
|---|---|---|---|---|---|
| [**oQ8e**](https://huggingface.co/splats/Qwen3.6-27B-oQ8e) | 8.0 | **8.00** | 27.5 GB | 0 | Full 8-bit Static |
| [**oQ6e**](https://huggingface.co/splats/Qwen3.6-27B-oQ6e) | 6.0 | **6.64** | 21.5 GB | 1 | 8bit×1 |
| [**oQ5e**](https://huggingface.co/splats/Qwen3.6-27B-oQ5e) | 5.0 | **5.70** | 18.6 GB | 18 | 8bit×1, 6bit×17 |
| [**oQ4e**](https://huggingface.co/splats/Qwen3.6-27B-oQ4e) | 4.0 | **4.70** | 15.4 GB | 46 | 6bit×26, 5bit×20 |
| [**oQ3.5e**](https://huggingface.co/splats/Qwen3.6-27B-oQ3.5e) | 3.5 | **4.00** | 13.2 GB | 44 | 8bit×1, 6bit×25, 5bit×18 |
### 🛠 Technical Build Audit
* **Calibration:** Uses a 600-sample dataset across code, reasoning, and multi-turn conversations.
* **Sensitivity Proxy:** Qwen3.6-27B-oQ8e (Internal Baseline).
* **Optimization Floor:** We always lock the `lm_head` and critical early blocks at 8-bit to ensure the model's basic "senses" remain intact.
## Model Highlights
* **Thinking Preservation:** Specifically tuned to maintain logic-chain stability in the 64-layer hybrid architecture.
* **Agentic Coding:** Native proficiency in repository-level reasoning and frontend generation.
---
**Acknowledgments:** These quants were built using the [oMLX](https://github.com/jundot/omlx) framework. The weight optimization process is based on the **GPTQ** algorithm by Frantar et al.
*Verified via Splats Lab Vault v2.8. These models are standard mlx-lm compatible and work with any app supporting MLX safetensors.*