Huihui-Qwopus3.5-4B-v3-abliterated GGUF

This repository contains GGUF format quantizations of huihui-ai/Huihui-Qwopus3.5-4B-v3-abliterated.

These files were quantized using llama.cpp to provide highly optimized CPU and GPU inference across various memory configurations.

📦 Available Quantizations

File Name	Quantization	Description
`huihui-qwopus3.5-4b-v3-Q8_0.gguf`	Q8_0	Extremely high quality, nearly indistinguishable from unquantized FP16. Requires the most RAM.
`huihui-qwopus3.5-4b-v3-Q6_K.gguf`	Q6_K	Excellent quality, minimal degradation. Great for high-end local setups.
`huihui-qwopus3.5-4b-v3-Q5_K_M.gguf`	Q5_K_M	High quality, excellent balance of performance and size.
`huihui-qwopus3.5-4b-v3-Q4_K_M.gguf`	Q4_K_M	Recommended. The optimal balance between memory usage, speed, and output quality. Ideal for 8GB RAM systems.
`huihui-qwopus3.5-4b-v3-Q3_K_M.gguf`	Q3_K_M	Smallest file size, fastest inference, but with noticeable quality loss compared to higher quants.

🗣️ Prompt Format (ChatML)

This model uses the ChatML prompt format. It is also a reasoning model, meaning it will often output a [Start thinking] block to map out its logic before providing the final answer.

<|im_start|>system
You are a helpful AI assistant.<|im_end|>
<|im_start|>user
Write a short poem about artificial intelligence.<|im_end|>
<|im_start|>assistant