Qwen 3.5 35B-A3B — JANG_2S (Mixed-Precision, 2-bit)

JANG — Jang Adaptive N-bit Grading | Mixed-Precision Quantization for Apple Silicon

Osaurus natively supports JANG models. Download at osaurus.ai.

Model Details

Property	Value
Base Model	Qwen 3.5 VL 35B-A3B
Architecture	MoE Transformer + Vision
Total Parameters	35B (3B active per token)
Profile	JANG_2S
Avg Bits/Weight	2.17
Bit Widths Used	2, 4, 6
Model Size	9 GB
Vision	Yes
Format	JANG v2 (MLX-native safetensors)

Benchmarks

200-question MMLU (20 per subject x 10 subjects). Thinking OFF (enable_thinking=False), greedy decoding (temp=0.0).

Model	MMLU	Size
JANG_2S (this)	65.5%	9 GB
MLX 2-bit	~20%	10 GB
MLX 4-bit	75.5%	18 GB

JANG_2S triples MLX 2-bit MMLU on MoE models. At 9 GB, this is the smallest coherent 35B model.

JANG_2S Profile

JANG_2S is an aggressive 2-bit mixed-precision profile that protects critical layers (attention, routing, embeddings) at higher precision while compressing expert MLP weights to 2-bit. Ideal for fitting large MoE models into limited memory.

Usage

# Requires Osaurus (https://osaurus.ai)
osaurus serve OsaurusAI/Qwen3.5-35B-A3B-JANG_2S

Requirements

Apple Silicon Mac with 16+ GB unified memory
MLX framework with Qwen 3.5 MoE support

Quantized by Osaurus AI using JANG

Downloads last month: 264

Safetensors

Model size

3B params

Tensor type

U32

F16

MLX

Hardware compatibility

Quantized