Q5 and Q8 K XL not working?
Am I the only one having problems with the Q5/Q8 K XL ggufs? I have downloaded them twice already and still wind up with this error:
load_backend: loaded BLAS backend from C:\llama.cpp\build\bin\Release\ggml-blas.dll
ggml_cuda_init: found 2 CUDA devices:
Device 0: NVIDIA GeForce RTX 5090, compute capability 12.0, VMM: yes
Device 1: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
load_backend: loaded CUDA backend from C:\llama.cpp\build\bin\Release\ggml-cuda.dll
load_backend: loaded CPU backend from C:\llama.cpp\build\bin\Release\ggml-cpu-cascadelake.dll
build: 8148 (244641955) with MSVC 19.44.35222.0 for x64
system info: n_threads = 16, n_threads_batch = 16, total_threads = 32
system_info: n_threads = 16 (n_threads_batch = 16) / 32 | CUDA : ARCHS = 890,1200 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 256 | FA_ALL_QUANTS = 1 | BLACKWELL_NATIVE_FP4 = 1 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | AVX512 = 1 | AVX512_VNNI = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |
Running without SSL
init: using 31 threads for HTTP server
start: binding port with default address family
main: loading model
srv load_model: loading model 'H:\Qwen3.5-27B-GGUF_unsloth\Qwen3.5-27B-UD-Q5_K_XL.gguf'
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 5090) (0000:01:00.0) - 30927 MiB free
llama_model_load_from_file_impl: using device CUDA1 (NVIDIA GeForce RTX 4090) (0000:03:00.0) - 22988 MiB free
gguf_init_from_file_impl: failed to read magic
llama_model_load: error loading model: llama_model_loader: failed to load model from H:\Qwen3.5-27B-GGUF_unsloth\Qwen3.5-27B-UD-Q5_K_XL.gguf
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model 'H:\Qwen3.5-27B-GGUF_unsloth\Qwen3.5-27B-UD-Q5_K_XL.gguf'
srv load_model: failed to load model, 'H:\Qwen3.5-27B-GGUF_unsloth\Qwen3.5-27B-UD-Q5_K_XL.gguf'
srv operator (): operator (): cleaning up before exit...
main: exiting due to model loading error
Hello did you manage to solve the issue?
https://github.com/ggml-org/llama.cpp/issues/19868
This issue is resolved just now, and the models are working for me.