Qwen 3.5
Collection
10 items β’ Updated
This repository contains GGUF format quantizations of huihui-ai/Huihui-Qwopus3.5-4B-v3-abliterated.
These files were quantized using llama.cpp to provide highly optimized CPU and GPU inference across various memory configurations.
| File Name | Quantization | Description |
|---|---|---|
huihui-qwopus3.5-4b-v3-Q8_0.gguf |
Q8_0 | Extremely high quality, nearly indistinguishable from unquantized FP16. Requires the most RAM. |
huihui-qwopus3.5-4b-v3-Q6_K.gguf |
Q6_K | Excellent quality, minimal degradation. Great for high-end local setups. |
huihui-qwopus3.5-4b-v3-Q5_K_M.gguf |
Q5_K_M | High quality, excellent balance of performance and size. |
huihui-qwopus3.5-4b-v3-Q4_K_M.gguf |
Q4_K_M | Recommended. The optimal balance between memory usage, speed, and output quality. Ideal for 8GB RAM systems. |
huihui-qwopus3.5-4b-v3-Q3_K_M.gguf |
Q3_K_M | Smallest file size, fastest inference, but with noticeable quality loss compared to higher quants. |
This model uses the ChatML prompt format. It is also a reasoning model, meaning it will often output a [Start thinking] block to map out its logic before providing the final answer.
<|im_start|>system
You are a helpful AI assistant.<|im_end|>
<|im_start|>user
Write a short poem about artificial intelligence.<|im_end|>
<|im_start|>assistant
3-bit
4-bit
5-bit
6-bit
8-bit
Base model
Qwen/Qwen3.5-4B-Base