Qwen3.5-122B-A10B GGUF Q6_K

GGUF quantization of Qwen/Qwen3.5-122B-A10B optimized for local inference with llama.cpp.

Details

  • Base model: Qwen/Qwen3.5-122B-A10B (Mixture of Experts)
  • Format: GGUF Q6_K (6-bit)
  • Quantized by: TxemAI
  • Quantization tool: llama.cpp

Performance (M4 Max, 128GB)

  • **Prompt processing: 99.8 t/s
  • **Generation: 35.6 t/s
  • **Size:~95GB

Usage

llama-cli \
  -m Qwen3.5-122B-A10B-Q6_K.gguf \
  -p "Your prompt here" \
  -n 200

When to use this vs Q4_K_M

Q6_K preserves more of the original model quality at the cost of larger file size (~95GB vs 69GB). Use this if you have sufficient RAM and prioritize output quality over speed. For a smaller footprint, see Q4_K_M version.

Notes

  • Requires llama.cpp for inference
  • Minimum 100GB RAM recommended
  • For Apple Silicon native format, see the MLX versions in the TxemAI collection
Downloads last month
170
GGUF
Model size
122B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

6-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TxemAI/Qwen3.5-122B-A10B-Q6_K-GGUF

Quantized
(104)
this model

Collection including TxemAI/Qwen3.5-122B-A10B-Q6_K-GGUF