Re-quantize is required for future version

#2
by hell0ks - opened

Hello, thanks for the amazing work!

It seems like latest Qwen3-VL PR for llama.cpp changed gguf format for some reason (likely to comply with visual accuracy). I think this change will be merged soon and this quant will not work.

Could you do gguf conversion again with this PR? Thanks again.

I will test with tr-qwen3-vl-6-b7106-495c611 to see if an update is needed.

Thanks for the great work!

The PR is now merged in llama.cpp upstream, and this quant does not work (llama.cpp/tools/mtmd/clip.cpp:859: GGML_ASSERT(model.patch_bias != nullptr) failed). So it does look like we need a re quantization.

Also, is there any chance for an abliterated thinking version of qwen3-vl-235b?

It has been updated to the official llama.cpp-b6907.

Thank you.

hell0ks changed discussion status to closed

Sign up or log in to comment