Qwen 3.5
Collection
10 items • Updated
This repository contains GGUF quantizations for jordanwoodson/Qwen3.5-2B-heretic, alongside the mmproj-F16.gguf vision projector to enable multimodal (image-to-text) inference.
This model is good for edge devices and better performance.
Please ensure you update the stop sequence in your inference client to:
<|im_end|>
| Filename | Format | Description |
|---|---|---|
Qwen3.5-2B-heretic-Q4_K_M.gguf |
Q4_K_M | Ideal sweet spot. Excellent balance of low memory footprint and high quality. |
Qwen3.5-2B-heretic-Q5_K_M.gguf |
Q5_K_M | Slightly larger than Q4, offering a marginal quality increase. |
Qwen3.5-2B-heretic-Q6_K.gguf |
Q6_K | Very high quality, nearly indistinguishable from unquantized. |
Qwen3.5-2B-heretic-Q8_0.gguf |
Q8_0 | Practically lossless fidelity, largest quantized size. |
mmproj-F16.gguf |
F16 | The vision projector required for any image analysis tasks. |
You can run this model locally using the llama.cpp command-line interface.
To run a standard text generation prompt, point llama-cli to your downloaded GGUF file:
./llama-cli -m Qwen3.5-2B-heretic-Q4_K_M.gguf -p "Write a quick Python script." -n 512
4-bit
5-bit
6-bit
8-bit
Base model
Qwen/Qwen3.5-2B-Base