Huihui-Qwopus3.5-4B-v3-abliterated GGUF

This repository contains GGUF format quantizations of huihui-ai/Huihui-Qwopus3.5-4B-v3-abliterated.

These files were quantized using llama.cpp to provide highly optimized CPU and GPU inference across various memory configurations.

πŸ“¦ Available Quantizations

File Name Quantization Description
huihui-qwopus3.5-4b-v3-Q8_0.gguf Q8_0 Extremely high quality, nearly indistinguishable from unquantized FP16. Requires the most RAM.
huihui-qwopus3.5-4b-v3-Q6_K.gguf Q6_K Excellent quality, minimal degradation. Great for high-end local setups.
huihui-qwopus3.5-4b-v3-Q5_K_M.gguf Q5_K_M High quality, excellent balance of performance and size.
huihui-qwopus3.5-4b-v3-Q4_K_M.gguf Q4_K_M Recommended. The optimal balance between memory usage, speed, and output quality. Ideal for 8GB RAM systems.
huihui-qwopus3.5-4b-v3-Q3_K_M.gguf Q3_K_M Smallest file size, fastest inference, but with noticeable quality loss compared to higher quants.

πŸ—£οΈ Prompt Format (ChatML)

This model uses the ChatML prompt format. It is also a reasoning model, meaning it will often output a [Start thinking] block to map out its logic before providing the final answer.

<|im_start|>system
You are a helpful AI assistant.<|im_end|>
<|im_start|>user
Write a short poem about artificial intelligence.<|im_end|>
<|im_start|>assistant
Downloads last month
1,235
GGUF
Model size
4B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Abiray/Huihui-Qwopus3.5-4B-v3-abliterated-GGUF

Collection including Abiray/Huihui-Qwopus3.5-4B-v3-abliterated-GGUF