splats
/

Qwen3.6-27B-oQ4e

mixed-precision

hybrid-attention

4-bit precision

Model card Files Files and versions

Qwen3.6-27B-oQ4e / README.md

splats's picture

Update README.md

4b65b0d verified 20 days ago

|

history blame contribute delete

2.81 kB

	---
	library_name: mlx
	license: apache-2.0
	base_model: Qwen/Qwen3.6-27B
	base_model_relation: quantized
	tags:
	- mlx
	- oQe
	- enhanced
	- mixed-precision
	- hybrid-attention
	- agentic
	track_downloads: true
	---

	# Qwen3.6-27B — oQe Series

	This repository contains Enhanced (oQe) MLX quants for Qwen3.6-27B. While standard quants are often made in one "streaming" pass, the oQe series uses a multi-stage optimization path to make sure the model doesn't lose its "intelligence" at lower bitrates.

	### 🚀 The oQe (Enhanced) Build Path
	We move away from simple rounding and use a more deliberate process to protect the model's logic:

	1. Sensitivity Mapping: We don't guess which layers are important. We run a calibration pass to measure exactly how much precision each layer needs to keep its output stable.
	2. Hessian-Based Tuning: Every time we round a weight to a lower bit, we adjust the surrounding weights to compensate for the error. This keeps the model's internal math from "drifting" as it gets smaller.
	3. Unified Batching: We use layer-wide batching to keep the dense architecture's weight distribution consistent with the original weights.

	### 📋 oQ Build Performance Matrix

	\| Tier \| Target bpw \| Actual bpw \| Size \| Precision Boosts \| Hybrid Plan / Strategy \|
	\|---\|---\|---\|---\|---\|---\|
	\| [oQ8e](https://huggingface.co/splats/Qwen3.6-27B-oQ8e) \| 8.0 \| 8.00 \| 27.5 GB \| 0 \| Full 8-bit Static \|
	\| [oQ6e](https://huggingface.co/splats/Qwen3.6-27B-oQ6e) \| 6.0 \| 6.64 \| 21.5 GB \| 1 \| 8bit×1 \|
	\| [oQ5e](https://huggingface.co/splats/Qwen3.6-27B-oQ5e) \| 5.0 \| 5.70 \| 18.6 GB \| 18 \| 8bit×1, 6bit×17 \|
	\| [oQ4e](https://huggingface.co/splats/Qwen3.6-27B-oQ4e) \| 4.0 \| 4.70 \| 15.4 GB \| 46 \| 6bit×26, 5bit×20 \|
	\| [oQ3.5e](https://huggingface.co/splats/Qwen3.6-27B-oQ3.5e) \| 3.5 \| 4.00 \| 13.2 GB \| 44 \| 8bit×1, 6bit×25, 5bit×18 \|

	### 🛠 Technical Build Audit
	* Calibration: Uses a 600-sample dataset across code, reasoning, and multi-turn conversations.
	* Sensitivity Proxy: Qwen3.6-27B-oQ8e (Internal Baseline).
	* Optimization Floor: We always lock the `lm_head` and critical early blocks at 8-bit to ensure the model's basic "senses" remain intact.

	## Model Highlights
	* Thinking Preservation: Specifically tuned to maintain logic-chain stability in the 64-layer hybrid architecture.
	* Agentic Coding: Native proficiency in repository-level reasoning and frontend generation.

	---
	Acknowledgments: These quants were built using the [oMLX](https://github.com/jundot/omlx) framework. The weight optimization process is based on the GPTQ algorithm by Frantar et al.

	Verified via Splats Lab Vault v2.8. These models are standard mlx-lm compatible and work with any app supporting MLX safetensors.