Qwen3.5-35B-A3B-Freed0m EXL3 6.0bpw

EXL3 quantization (6.00 bpw) of unsloth/Qwen3.5-35B-A3B-Freed0m. Optimized for exllamav3 runtime on Ampere GPUs (RTX 3090/4090).

Quantization Details

  • Method: EXL3 (trellis codebook + MCG encoding)
  • Bits per weight: 6.0 bpw (layer), 6.0 bpw (head)
  • Calibration: 128 rows x 2048 cols
  • Scale output: always
  • Codebook: MCG (Multi-Codebook Generalized)
  • Version: exllamav3 0.0.26

Quality

Evaluated on WikiText-2 (seqlen=2048, stride=512):

Config Tokens Perplexity Time
200 rows 409,400 7.09 626s
8 rows 16,376 8.26 48s

Usage

This model requires exllamav3 to run. It is not compatible with standard transformers or vLLM.

from exllamav3 import Model, Config

config = Config.from_directory("groxaxo/Qwen3.5-35B-A3B-Freed0m-EXL3-6.0bpw")
model = Model(config)
model.load()

# Use the exllamav3 API for inference

Model Architecture

  • Type: Qwen3.5 MoE (Mixture of Experts)
  • Active params: ~35B (35B-A3B)
  • Total experts: 256 fused experts, 40 router layers
  • Layers: 40
  • Attention mix: Linear attention (75%) + Full attention (25%), with full attention every 4th layer
  • Hidden size: 2048
  • Head dim: 256
  • Vision: 3D patch embedding (Conv3d) for video understanding
  • Activation: SiLU

File Sizes

File Size
model-00001-of-00004.safetensors ~8.0 GB
model-00002-of-00004.safetensors ~8.3 GB
model-00003-of-00004.safetensors ~8.3 GB
model-00004-of-00004.safetensors ~3.2 GB
Total ~26 GB

Differences from Base Model

The base model (unsloth/Qwen3.5-35B-A3B-Freed0m) uses BF16 full precision weights (~70GB+). This EXL3 variant replaces raw weight matrices with trellis codebook indices (int16) + scale vectors (suh/svh) + MCG multipliers, achieving ~3:1 compression while preserving quality with minimal perplexity degradation.

Hardware Requirements

  • GPU: Ampere or newer (RTX 3090, 4090, A6000, etc.)
  • VRAM: ~26 GB (fits on a single 24 GB GPU with offloading)
  • Runtime: exllamav3 >= 0.0.26
Downloads last month
258
Safetensors
Model size
14B params
Tensor type
F16
I16
F32
BF16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support