Instructions to use splats/Qwen3.6-27B-oQ4e with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use splats/Qwen3.6-27B-oQ4e with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir Qwen3.6-27B-oQ4e splats/Qwen3.6-27B-oQ4e
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
| library_name: mlx | |
| license: apache-2.0 | |
| base_model: Qwen/Qwen3.6-27B | |
| base_model_relation: quantized | |
| tags: | |
| - mlx | |
| - oQe | |
| - enhanced | |
| - mixed-precision | |
| - hybrid-attention | |
| - agentic | |
| track_downloads: true | |
| # Qwen3.6-27B — oQe Series | |
| This repository contains **Enhanced (oQe)** MLX quants for **Qwen3.6-27B**. While standard quants are often made in one "streaming" pass, the **oQe** series uses a multi-stage optimization path to make sure the model doesn't lose its "intelligence" at lower bitrates. | |
| ### 🚀 The oQe (Enhanced) Build Path | |
| We move away from simple rounding and use a more deliberate process to protect the model's logic: | |
| 1. **Sensitivity Mapping:** We don't guess which layers are important. We run a calibration pass to measure exactly how much precision each layer needs to keep its output stable. | |
| 2. **Hessian-Based Tuning:** Every time we round a weight to a lower bit, we adjust the surrounding weights to compensate for the error. This keeps the model's internal math from "drifting" as it gets smaller. | |
| 3. **Unified Batching:** We use layer-wide batching to keep the dense architecture's weight distribution consistent with the original weights. | |
| ### 📋 oQ Build Performance Matrix | |
| | Tier | Target bpw | Actual bpw | Size | Precision Boosts | Hybrid Plan / Strategy | | |
| |---|---|---|---|---|---| | |
| | [**oQ8e**](https://huggingface.co/splats/Qwen3.6-27B-oQ8e) | 8.0 | **8.00** | 27.5 GB | 0 | Full 8-bit Static | | |
| | [**oQ6e**](https://huggingface.co/splats/Qwen3.6-27B-oQ6e) | 6.0 | **6.64** | 21.5 GB | 1 | 8bit×1 | | |
| | [**oQ5e**](https://huggingface.co/splats/Qwen3.6-27B-oQ5e) | 5.0 | **5.70** | 18.6 GB | 18 | 8bit×1, 6bit×17 | | |
| | [**oQ4e**](https://huggingface.co/splats/Qwen3.6-27B-oQ4e) | 4.0 | **4.70** | 15.4 GB | 46 | 6bit×26, 5bit×20 | | |
| | [**oQ3.5e**](https://huggingface.co/splats/Qwen3.6-27B-oQ3.5e) | 3.5 | **4.00** | 13.2 GB | 44 | 8bit×1, 6bit×25, 5bit×18 | | |
| ### 🛠 Technical Build Audit | |
| * **Calibration:** Uses a 600-sample dataset across code, reasoning, and multi-turn conversations. | |
| * **Sensitivity Proxy:** Qwen3.6-27B-oQ8e (Internal Baseline). | |
| * **Optimization Floor:** We always lock the `lm_head` and critical early blocks at 8-bit to ensure the model's basic "senses" remain intact. | |
| ## Model Highlights | |
| * **Thinking Preservation:** Specifically tuned to maintain logic-chain stability in the 64-layer hybrid architecture. | |
| * **Agentic Coding:** Native proficiency in repository-level reasoning and frontend generation. | |
| --- | |
| **Acknowledgments:** These quants were built using the [oMLX](https://github.com/jundot/omlx) framework. The weight optimization process is based on the **GPTQ** algorithm by Frantar et al. | |
| *Verified via Splats Lab Vault v2.8. These models are standard mlx-lm compatible and work with any app supporting MLX safetensors.* |