Qwen 3.5 35B-A3B — JANG_2S (Mixed-Precision, 2-bit)
JANG — Jang Adaptive N-bit Grading | Mixed-Precision Quantization for Apple Silicon
Osaurus natively supports JANG models. Download at osaurus.ai.
Model Details
| Property | Value |
|---|---|
| Base Model | Qwen 3.5 VL 35B-A3B |
| Architecture | MoE Transformer + Vision |
| Total Parameters | 35B (3B active per token) |
| Profile | JANG_2S |
| Avg Bits/Weight | 2.17 |
| Bit Widths Used | 2, 4, 6 |
| Model Size | 9 GB |
| Vision | Yes |
| Format | JANG v2 (MLX-native safetensors) |
Benchmarks
200-question MMLU (20 per subject x 10 subjects). Thinking OFF (enable_thinking=False), greedy decoding (temp=0.0).
| Model | MMLU | Size |
|---|---|---|
| JANG_2S (this) | 65.5% | 9 GB |
| MLX 2-bit | ~20% | 10 GB |
| MLX 4-bit | 75.5% | 18 GB |
JANG_2S triples MLX 2-bit MMLU on MoE models. At 9 GB, this is the smallest coherent 35B model.
JANG_2S Profile
JANG_2S is an aggressive 2-bit mixed-precision profile that protects critical layers (attention, routing, embeddings) at higher precision while compressing expert MLP weights to 2-bit. Ideal for fitting large MoE models into limited memory.
Usage
# Requires Osaurus (https://osaurus.ai)
osaurus serve OsaurusAI/Qwen3.5-35B-A3B-JANG_2S
Requirements
- Apple Silicon Mac with 16+ GB unified memory
- MLX framework with Qwen 3.5 MoE support
Quantized by Osaurus AI using JANG
- Downloads last month
- 264
Model size
3B params
Tensor type
U32
·
F16 ·
Hardware compatibility
Log In to add your hardware
Quantized