Qwen3.5-2B-heretic-GGUF

This repository contains GGUF quantizations for jordanwoodson/Qwen3.5-2B-heretic, alongside the mmproj-F16.gguf vision projector to enable multimodal (image-to-text) inference.

This model is good for edge devices and better performance.

Stop Sequence

Please ensure you update the stop sequence in your inference client to: <|im_end|>

Available Files

Filename	Format	Description
`Qwen3.5-2B-heretic-Q4_K_M.gguf`	Q4_K_M	Ideal sweet spot. Excellent balance of low memory footprint and high quality.
`Qwen3.5-2B-heretic-Q5_K_M.gguf`	Q5_K_M	Slightly larger than Q4, offering a marginal quality increase.
`Qwen3.5-2B-heretic-Q6_K.gguf`	Q6_K	Very high quality, nearly indistinguishable from unquantized.
`Qwen3.5-2B-heretic-Q8_0.gguf`	Q8_0	Practically lossless fidelity, largest quantized size.
`mmproj-F16.gguf`	F16	The vision projector required for any image analysis tasks.

Usage with llama.cpp

You can run this model locally using the llama.cpp command-line interface.

Standard Text Inference

To run a standard text generation prompt, point llama-cli to your downloaded GGUF file:

./llama-cli -m Qwen3.5-2B-heretic-Q4_K_M.gguf -p "Write a quick Python script." -n 512

Downloads last month: 136

GGUF

Model size

2B params

Architecture

qwen35

Hardware compatibility

4-bit

5-bit

6-bit

8-bit

Model tree for Abiray/Qwen3.5-2B-heretic-GGUF

Base model

Qwen/Qwen3.5-2B-Base

Finetuned

Qwen/Qwen3.5-2B

Finetuned

jordanwoodson/Qwen3.5-2B-heretic

Quantized

(2)

this model

Collection including Abiray/Qwen3.5-2B-heretic-GGUF

Qwen 3.5

Collection

10 items • Updated 15 days ago