How do I load i.e. gpt-oss-20b-UD-Q6_K_XL in llama.cpp?
Every time I try to run the model with "llama-server" or "llama-cli" I get
gguf_init_from_file_impl: tensor 'blk.0.ffn_down_exps.weight' has invalid ggml type 39 (NONE)
gguf_init_from_file_impl: failed to read tensor info
llama_model_load: error loading model: llama_model_loader: failed to load model from ....\LLM\gpt-oss-20b-UD-Q6_K_XL.gguf
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model '....\LLM\gpt-oss-20b-UD-Q6_K_XL.gguf'
main: error: unable to load model
and
gguf_init_from_file_impl: tensor 'blk.0.ffn_down_exps.weight' has invalid ggml type 39 (NONE)
gguf_init_from_file_impl: failed to read tensor info
llama_model_load: error loading model: llama_model_loader: failed to load model from gpt-oss-20b-UD-Q6_K_XL.gguf
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model 'gpt-oss-20b-UD-Q6_K_XL.gguf'
srv load_model: failed to load model, 'gpt-oss-20b-UD-Q6_K_XL.gguf'
srv operator(): operator(): cleaning up before exit...
main: exiting due to model loading error
Anyone knows how to fix this? Is this an issue related to llamacpp https://github.com/ggml-org/llama.cpp or maybe the UD quant dynamic (https://www.unsloth.ai/blog/dynamic-v2)?
Thanks in advance.
llama-server -hf unsloth/gpt-oss-20b-GGUF:gpt-oss-20b-Q6_K.gguf and then any other flags you need is probably the easiest way
Nvm I didnt read your comment properly :D Have you rebuild llama.cpp recently?! That's the first thing I'd try here due to "has invalid ggml type 39 (NONE)"
Not yet. Thanks for answering though. Waiting for an update on llama.cpp to support the UD quant version. Llama.cpp is updated on my windows host automatically imo (installed via winget) and now I probably have to wait before manually build the repo. I just try it later again and if it's working I'm gonna close this thread.
I just ran the model you tried (gpt-oss-20b-UD-Q6_K_XL.gguf) and had no issues (apart from the result being pretty bad in terms of code analysis lol
If you are getting llama.cpp server from winget, you may have to wait till that repo is updated. I always manually build from the master branch once a week and didn't have any issues as I said.
Here are my options if you wanna try getting into that(though it's linux but you could just as easily do this in a WSL terminal - it would probably take even longer but... ):
mkdir build
cd build
cmake ..
-DLLAMA_CURL=ON
-DBUILD_SHARED_LIBS=ON
-DGGML_CUDA=ON
-DGGML_CUDA_FA=ON
-DGGML_CUDA_FA_ALL_QUANTS=ON
-DGGML_CUDA_F16=ON
-DGGML_CUDA_GRAPHS=ON
-DLLAMA_ENABLE_MTMD=1
cmake --build . --config Release
Couple of those flags aren't needed but whatever. The longest part is always CUDA compilation.
I just ran the model you tried (gpt-oss-20b-UD-Q6_K_XL.gguf) and had no issues (apart from the result being pretty bad in terms of code analysis lol
If you are getting llama.cpp server from winget, you may have to wait till that repo is updated. I always manually build from the master branch once a week and didn't have any issues as I said.
Here are my options if you wanna try getting into that(though it's linux but you could just as easily do this in a WSL terminal - it would probably take even longer but... ):
mkdir build
cd build
cmake ..
-DLLAMA_CURL=ON
-DBUILD_SHARED_LIBS=ON
-DGGML_CUDA=ON
-DGGML_CUDA_FA=ON
-DGGML_CUDA_FA_ALL_QUANTS=ON
-DGGML_CUDA_F16=ON
-DGGML_CUDA_GRAPHS=ON
-DLLAMA_ENABLE_MTMD=1
cmake --build . --config ReleaseCouple of those flags aren't needed but whatever. The longest part is always CUDA compilation.
Thank you. Indeed that has been solved by issueing "winget upgrade --all" in an elevated powershell window. An outdated llama.cpp version (and several other apps) was the case. Closing this due to the model is running now.