gemma-4-31B-it-UD-IQ3_XXS.gguf is unsable - gibberish output in Unsloth Studio + vanilla llama-server

#4
by ykarout - opened

How do you fine-tune an audio model with Unsloth?

jerH singular singular singular la l l sameHaan la- la ability singular singular laS la singular same sameSSHal la sameTH la same same single la- la/ de single la laL la same1T la/ singleH la la same la laT/ singularP same single la singularSSS/ la la/ laS laCis laT a- la laSC la laenT laer same la de single single/S laTT싱/ singular//かけてSic неVARIABLECB/TSLHL la singular Schools]],/P la singular laSdevelopers la laS la laers lvan0/ a la l la la fadS la single

2 / 2

Did you update Unsloth Studio to the latest version? Do you have a screenshot of the issue? Might be the tokenizer issue which llama.cpp is fixing (unrelated to unsloth).

I tried it many times and it works all the times:

Screenshot 2026-04-03 at 3.19.22 AM
Screenshot 2026-04-03 at 3.19.45 AM

Screenshot 2026-04-03 at 3.20.38 AM

Screenshot 2026-04-03 at 3.21.10 AM

Screenshot 2026-04-03 at 3.21.40 AM

@danielhanchen i opened an issue on github and tagged you. seems this is related to cuda compilation on blackwell + FA kernels as I tried your quant on cpu only + cuBLAS and it worked perfectly. the issue is not the tokenizer as I tested the latest release both on unsloth studio and llama.cpp which fix the tokenizer issue but the issue persists. probably cuda 13.2 + blackwell issue

I'm having the gibberish issue as well using llama-server with 31B-it-UD-Q8_K_XL. I have been running LLM's for years now and can usually solve any problems like this, but I'm stumped. Not sure if its a quant issue or on my end.

@Menthols Seems not a quant issue as cpu only and cuBLAS compiled llama binaries are running the quant with normal output. Check https://github.com/ggml-org/llama.cpp/issues/21371#issuecomment-4183330511

@danielhanchen confirmed the issue stems from CUDA 13.2 compilation even on the latest release (maybe on Blackwell at least, my test is on a 5080) . The quants are 100% fine. The issue is with CUDA 13.2 - I recompiled latest b8648 on a cuda12.8 container running on the same machine and tested it and it works flawlessly.

on my machine when I ran unsloth studio update, it errored when tried to download the binaries and fell back to offline compilation which explains the same behavior in unsloth studio and llama-server both compiled with CUDA 13.2 (I am running Linux/Fedora but a user on github also reported same issue with CUDA 13.2 on Windows and Blackwell also)

ykarout changed discussion status to closed

Sign up or log in to comment