Request: official Q8_0 and BF16 GGUFs

by teresch - opened Dec 20, 2025

Dec 20, 2025

•

edited Dec 20, 2025

Hello, thanks for all the work!
If it's possible, can you please create an official Q8_0 and BF16 GGUFs as official ollama repo only has Q4_K_M variant. I'm using LLMs for translation, so speed and the size of models are not exactly the problem. I also noticed that quantanisation, even if mildly, but impact on final translation result. Well, it's more like less quantanised model has better, and what is more important, more consistent translation. And as the speed for tranclation is not very critical, compared to chat, even 1 token per seconds is good enough.
So, i tried this model in Q4_K_M variant, and got really impressed with the results, it seems to be on part with BF16 Mistral Small 3.1 24b, and sometive even better. So i'm really wondering how the full weight model will behave.
I tried to convert this model myself with llama.cpp, but the final ollama model import seem to not working properly. I get 500 Internal Server Error: do load request: Post "http://127.0.0.1:6733/load": EOFerror when i try to talk to it in chat.
I used the same Modelfile that i copied from huihui-ai official Q4_M_K variant:

FROM ./Qwen-32b-vl-bf16.gguf
TEMPLATE {{ .Prompt }}
RENDERER qwen3-vl-instruct
PARSER qwen3-vl-instruct
PARAMETER temperature 1
PARAMETER top_k 20
PARAMETER top_p 0.95
LICENSE """<full Apache 2.0 license text >"""

So if it's possible, please make and official GGUFs, as I seem to done sth wrong while converting.

huihui-ai

Owner Dec 20, 2025

You may notice that the timestamps of some models in the Qwen3-VL series on https://huggingface.co/huihui-ai have been modified. In fact, we deleted the .GGUF files from certain models because they are incompatible with Ollama and could lead to misunderstandings. This move is also intended to save storage space on Hugging Face (HF).

huihui-ai

Owner Dec 20, 2025

•

edited Dec 20, 2025

hf download huihui-ai/Huihui-Qwen3-VL-32B-Instruct-abliterated --local-dir ./huihui-ai/Huihui-Qwen3-VL-32B-Instruct-abliterated

ollama pull qwen3-vl:32b-instruct
ollama show qwen3-vl:32b-instruct --modelfile > Modelfile-instruct

#Modify the FROM field in the Modelfile to point to the path of the current model; keep all other parts unchanged.
FROM huihui-ai/Huihui-Qwen3-VL-32B-Instruct-abliterated

ollama create -f Modelfile-instruct huihui_ai/qwen3-vl-abliterated:32b-instruct-fp16 -q f16
ollama create -f Modelfile-instruct huihui_ai/qwen3-vl-abliterated:32b-instruct-q8_0 -q q8_0
ollama create -f Modelfile-instruct huihui_ai/qwen3-vl-abliterated:32b-instruct-q4_K_M -q q4_K_M

teresch

Dec 20, 2025

•

edited Dec 20, 2025

I will try, thank you.

UPD: Everything worked, thans a lot!
Weirdly enoght the model was in F16 and not BF16 weights after .safetensors conversion, even if hf repo says it's supposed to be BF16.
Because when i tried to do ollama create -f Modelfile-instruct huihui_ai/qwen3-vl-abliterated:32b-instruct-bf16 -q bf16,
after the model was successfuly converted, it once more tried to convert to BF16 from FP16.

Goldlionren00

Feb 11

where can I get the file for --mmproj?
thanks

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment