gemma-4-31B-it-UD-IQ3_XXS.gguf is unsable - gibberish output in Unsloth Studio + vanilla llama-server
How do you fine-tune an audio model with Unsloth?
jerH singular singular singular la l l sameHaan la- la ability singular singular laS la singular same sameSSHal la sameTH la same same single la- la/ de single la laL la same1T la/ singleH la la same la laT/ singularP same single la singularSSS/ la la/ laS laCis laT a- la laSC la laenT laer same la de single single/S laTT싱/ singular//かけてSic неVARIABLECB/TSLHL la singular Schools]],/P la singular laSdevelopers la laS la laers lvan0/ a la l la la fadS la single
2 / 2
@danielhanchen i opened an issue on github and tagged you. seems this is related to cuda compilation on blackwell + FA kernels as I tried your quant on cpu only + cuBLAS and it worked perfectly. the issue is not the tokenizer as I tested the latest release both on unsloth studio and llama.cpp which fix the tokenizer issue but the issue persists. probably cuda 13.2 + blackwell issue
I'm having the gibberish issue as well using llama-server with 31B-it-UD-Q8_K_XL. I have been running LLM's for years now and can usually solve any problems like this, but I'm stumped. Not sure if its a quant issue or on my end.
@Menthols Seems not a quant issue as cpu only and cuBLAS compiled llama binaries are running the quant with normal output. Check https://github.com/ggml-org/llama.cpp/issues/21371#issuecomment-4183330511
@danielhanchen confirmed the issue stems from CUDA 13.2 compilation even on the latest release (maybe on Blackwell at least, my test is on a 5080) . The quants are 100% fine. The issue is with CUDA 13.2 - I recompiled latest b8648 on a cuda12.8 container running on the same machine and tested it and it works flawlessly.
on my machine when I ran unsloth studio update, it errored when tried to download the binaries and fell back to offline compilation which explains the same behavior in unsloth studio and llama-server both compiled with CUDA 13.2 (I am running Linux/Fedora but a user on github also reported same issue with CUDA 13.2 on Windows and Blackwell also)




