Qwen3.5-35B-A3B-heretic-v2-eq-v1-GGUF

GGUF quantizations of nivvis/Qwen3.5-35B-A3B-heretic-v2-eq-v1 for use with llama.cpp, Ollama, LM Studio, and other GGUF-compatible runtimes.

See the bf16 model card for full details on the model, training, and EQ-Bench results.

Files

File	Quant	Size	Notes
`*-F16.gguf-00001-of-00009` ... `00009`	F16	~65GB (9 shards)	Full precision — lossless conversion from bf16
`*-Q4_K_M.gguf-00001-of-00003` ... `00003`	Q4_K_M	~20GB (3 shards)	Recommended — best quality/size balance
`*-mmproj-F16.gguf`	F16	858MB	Vision projector (required for image input)

llama.cpp auto-detects split shards — just point to the first file (-00001-of-*).

EQ-Bench 3

Rubric Score: 83.85 (judge: claude-3.7-sonnet) — measured on the bf16 source model.

Model	Active Params	EQBench Score
Qwen3.5-35B-A3B-heretic-v2-eq-v1 (ours)	3B	83.85
Qwen3.5-27B dense	27B	83.05
Qwen3-235B-A22B	22B	80.90
QwQ-32B	32B	79.90
Qwen3.5-35B-A3B (baseline)	3B	77.85
Qwen3-32B	32B	74.30
Qwen3-30B-A3B	3B	66.00

Note: EQ-Bench scores are from the bf16 model. Q4_K_M quantization may slightly affect quality.

Note on judge model: Public EQ-Bench 3 leaderboard scores for this family of models use claude-3.7-sonnet as the judge, so we use the same for comparability. We plan to publish updated benchmarks with newer judge models (including Opus) in the future.

Usage

llama.cpp

llama-cli \
  -m qwen35-35b-heretic-v2-eq-v1-Q4_K_M.gguf-00001-of-00003.gguf \
  --mmproj qwen35-35b-heretic-v2-eq-v1-mmproj-F16.gguf \
  -p "My best friend got the promotion I wanted. I said congrats but feel terrible. What do I do?" \
  -n 512

llama-server

llama-server \
  -m qwen35-35b-heretic-v2-eq-v1-Q4_K_M.gguf-00001-of-00003.gguf \
  --mmproj qwen35-35b-heretic-v2-eq-v1-mmproj-F16.gguf \
  --host 0.0.0.0 \
  --port 8080 \
  -ngl 99 \
  --jinja

--jinja enables tool calling via the bundled chat template. -ngl 99 offloads all layers to GPU.