Qwen3.5-2B-heretic-GGUF

This repository contains GGUF quantizations for jordanwoodson/Qwen3.5-2B-heretic, alongside the mmproj-F16.gguf vision projector to enable multimodal (image-to-text) inference.

This model is good for edge devices and better performance.

Stop Sequence

Please ensure you update the stop sequence in your inference client to: <|im_end|>


Available Files

Filename Format Description
Qwen3.5-2B-heretic-Q4_K_M.gguf Q4_K_M Ideal sweet spot. Excellent balance of low memory footprint and high quality.
Qwen3.5-2B-heretic-Q5_K_M.gguf Q5_K_M Slightly larger than Q4, offering a marginal quality increase.
Qwen3.5-2B-heretic-Q6_K.gguf Q6_K Very high quality, nearly indistinguishable from unquantized.
Qwen3.5-2B-heretic-Q8_0.gguf Q8_0 Practically lossless fidelity, largest quantized size.
mmproj-F16.gguf F16 The vision projector required for any image analysis tasks.

Usage with llama.cpp

You can run this model locally using the llama.cpp command-line interface.

Standard Text Inference

To run a standard text generation prompt, point llama-cli to your downloaded GGUF file:

./llama-cli -m Qwen3.5-2B-heretic-Q4_K_M.gguf -p "Write a quick Python script." -n 512
Downloads last month
136
GGUF
Model size
2B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Abiray/Qwen3.5-2B-heretic-GGUF

Finetuned
Qwen/Qwen3.5-2B
Quantized
(2)
this model

Collection including Abiray/Qwen3.5-2B-heretic-GGUF