Qwen3.6-35B-A3B — oQe Series
This repository contains Enhanced (oQe) MLX quants for the original Qwen3.6-35B-A3B. While standard quants are often made in one "streaming" pass, the oQe series uses a multi-stage optimization path to make sure the model doesn't lose its "intelligence" at lower bitrates.
🚀 The oQe (Enhanced) Build Path
We move away from simple rounding and use a more deliberate process to protect the model's logic:
- Sensitivity Mapping: We don't guess which layers are important. We run a calibration pass to measure exactly how much precision each layer needs to keep its output stable.
- Hessian-Based Tuning: Every time we round a weight to a lower bit, we adjust the surrounding weights to compensate for the error. This keeps the model's internal math from "drifting" as it gets smaller.
- Unified Batching: For this MoE architecture, we process all 256 experts together so the 8+1 routing path stays stable and precise.
📋 oQ Build Performance Matrix
| Tier | Target bpw | Actual bpw | Size | Precision Boosts | Hybrid Plan / Strategy |
|---|---|---|---|---|---|
| oQ8e | 8.0 | 8.00 | 35.1 GB | 0 | Full 8-bit Static |
| oQ6e | 6.0 | 6.60 | 27.4 GB | 162 | 8bit×162 (DeltaNet Anchor) |
| oQ5e | 5.0 | 5.67 | 23.6 GB | 352 | 8bit×162, 6bit×190 |
| oQ4e | 4.0 | 4.70 | 19.7 GB | 317 | 8bit×162, 6bit×47, 5bit×108 |
| oQ3.5e | 3.5 | 4.00 | 16.9 GB | 60 | 8bit×10, 5bit×10, 4bit×40 |
🛠 Technical Build Audit
- Calibration: Uses a 600-sample dataset across code, reasoning, and multi-turn conversations.
- Sensitivity Proxy: Qwen3.6-35B-A3B-oQ8e (Internal Baseline).
- Optimization Floor: Mandatory 8-bit protection is applied to the
lm_head, DeltaNet heads, and the first 6 layers to keep the "Agentic Coding" features native.
Model Highlights
- Native MoE Scaling: Optimized for the 256-expert architecture (8+1 activated).
- Thinking Preservation: Retains reasoning context in historical messages via DeltaNet stability.
Acknowledgments: These quants were built using the oMLX framework. The weight optimization process is based on the GPTQ algorithm by Frantar et al.
Verified via Splats Lab Vault v2.8. These models are standard mlx-lm compatible and work with any app supporting MLX safetensors.
- Downloads last month
- 464
Model size
5B params
Tensor type
BF16
·
U32 ·
Hardware compatibility
Log In to add your hardware
3-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for splats/Qwen3.6-35B-A3B-oQ3.5e
Base model
Qwen/Qwen3.6-35B-A3B