Qwen3.5-122B-A10B GGUF Q6_K

GGUF quantization of Qwen/Qwen3.5-122B-A10B optimized for local inference with llama.cpp.

Details

Base model: Qwen/Qwen3.5-122B-A10B (Mixture of Experts)
Format: GGUF Q6_K (6-bit)
Quantized by: TxemAI
Quantization tool: llama.cpp

Performance (M4 Max, 128GB)

**Prompt processing: 99.8 t/s
**Generation: 35.6 t/s
**Size:~95GB

Usage

llama-cli \
  -m Qwen3.5-122B-A10B-Q6_K.gguf \
  -p "Your prompt here" \
  -n 200

When to use this vs Q4_K_M

Q6_K preserves more of the original model quality at the cost of larger file size (~95GB vs 69GB). Use this if you have sufficient RAM and prioritize output quality over speed. For a smaller footprint, see Q4_K_M version.

Notes

Requires llama.cpp for inference
Minimum 100GB RAM recommended
For Apple Silicon native format, see the MLX versions in the TxemAI collection

Downloads last month: 170

GGUF

Model size

122B params

Architecture

qwen35moe

Hardware compatibility

6-bit

Model tree for TxemAI/Qwen3.5-122B-A10B-Q6_K-GGUF

Base model

Qwen/Qwen3.5-122B-A10B

Quantized

(104)

this model

Collection including TxemAI/Qwen3.5-122B-A10B-Q6_K-GGUF

Qwen3.5 MLX — Apple Silicon Optimized

Collection

4 items • Updated 18 days ago