gemma-4-31B-it-UD-IQ3_XXS.gguf is unsable - gibberish output in Unsloth Studio + vanilla llama-server

by ykarout - opened 19 days ago

How do you fine-tune an audio model with Unsloth?

jerH singular singular singular la l l sameHaan la- la ability singular singular laS la singular same sameSSHal la sameTH la same same single la- la/ de single la laL la same1T la/ singleH la la same la laT/ singularP same single la singularSSS/ la la/ laS laCis laT a- la laSC la laenT laer same la de single single/S laTT싱/ singular//かけてSic неVARIABLECB/TSLHL la singular Schools]],/P la singular laSdevelopers la laS la laers lvan0/ a la l la la fadS la single

2 / 2

danielhanchen

Unsloth AI org 19 days ago

•

edited 19 days ago

Did you update Unsloth Studio to the latest version? Do you have a screenshot of the issue? Might be the tokenizer issue which llama.cpp is fixing (unrelated to unsloth).

I tried it many times and it works all the times:

ykarout

19 days ago

@danielhanchen i opened an issue on github and tagged you. seems this is related to cuda compilation on blackwell + FA kernels as I tried your quant on cpu only + cuBLAS and it worked perfectly. the issue is not the tokenizer as I tested the latest release both on unsloth studio and llama.cpp which fix the tokenizer issue but the issue persists. probably cuda 13.2 + blackwell issue

Menthols

19 days ago

•

edited 19 days ago

I'm having the gibberish issue as well using llama-server with 31B-it-UD-Q8_K_XL. I have been running LLM's for years now and can usually solve any problems like this, but I'm stumped. Not sure if its a quant issue or on my end.

ykarout

19 days ago

@Menthols Seems not a quant issue as cpu only and cuBLAS compiled llama binaries are running the quant with normal output. Check https://github.com/ggml-org/llama.cpp/issues/21371#issuecomment-4183330511

ykarout

19 days ago

@danielhanchen confirmed the issue stems from CUDA 13.2 compilation even on the latest release (maybe on Blackwell at least, my test is on a 5080) . The quants are 100% fine. The issue is with CUDA 13.2 - I recompiled latest b8648 on a cuda12.8 container running on the same machine and tested it and it works flawlessly.

on my machine when I ran unsloth studio update, it errored when tried to download the binaries and fell back to offline compilation which explains the same behavior in unsloth studio and llama-server both compiled with CUDA 13.2 (I am running Linux/Fedora but a user on github also reported same issue with CUDA 13.2 on Windows and Blackwell also)

ykarout changed discussion status to closed 18 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment