[SOLVED] llama_model_load: unknown model architecture: 'qwen35'
llama.cpp version: 8219 (cf07c5b)
llama-cli --model Qwen3.5-27B-UD-Q8_K_XL.gguf --mmproj mmproj-F16.gguf --temp 1.0 --verbose
[...]
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35'
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Iris(R) Xe Graphics (ADL GT2) (Intel open-source Mesa driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: none
build: 8219 (cf07c5b) with GNU 15.2.1 for Linux x86_64
srv load_model: loading model 'Qwen3.5-27B-UD-Q8_K_XL.gguf'
common_init_result: fitting params to device memory, for bugs during this step try to reproduce them with -fit off, or provide --verbose logs if the bug only occurs with -fit on
llama_params_fit_impl: getting device memory data for initial parameters:
[0mllama_model_load_from_file_impl: using device Vulkan0 (Intel(R) Iris(R) Xe Graphics (ADL GT2)) (0000:00:02.0) - 43293 MiB free
llama_model_loader: loaded meta data with 49 key-value pairs and 851 tensors from Qwen3.5-27B-UD-Q8_K_XL.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = qwen35
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.sampling.top_k i32 = 20
llama_model_loader: - kv 3: general.sampling.top_p f32 = 0.950000
llama_model_loader: - kv 4: general.sampling.temp f32 = 0.600000
llama_model_loader: - kv 5: general.name str = Qwen3.5-27B
llama_model_loader: - kv 6: general.basename str = Qwen3.5-27B
llama_model_loader: - kv 7: general.quantized_by str = Unsloth
llama_model_loader: - kv 8: general.size_label str = 27B
llama_model_loader: - kv 9: general.license str = apache-2.0
llama_model_loader: - kv 10: general.license.link str = https://huggingface.co/Qwen/Qwen3.5-2...
llama_model_loader: - kv 11: general.repo_url str = https://huggingface.co/unsloth
llama_model_loader: - kv 12: general.base_model.count u32 = 1
llama_model_loader: - kv 13: general.base_model.0.name str = Qwen3.5 27B
llama_model_loader: - kv 14: general.base_model.0.organization str = Qwen
llama_model_loader: - kv 15: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen3.5-27B
llama_model_loader: - kv 16: general.tags arr[str,3] = ["qwen3_5_moe", "unsloth", "image-tex...
llama_model_loader: - kv 17: qwen35.block_count u32 = 64
llama_model_loader: - kv 18: qwen35.context_length u32 = 262144
llama_model_loader: - kv 19: qwen35.embedding_length u32 = 5120
llama_model_loader: - kv 20: qwen35.feed_forward_length u32 = 17408
llama_model_loader: - kv 21: qwen35.attention.head_count u32 = 24
llama_model_loader: - kv 22: qwen35.attention.head_count_kv u32 = 4
llama_model_loader: - kv 23: qwen35.rope.dimension_sections arr[i32,4] = [11, 11, 10, 0]
llama_model_loader: - kv 24: qwen35.rope.freq_base f32 = 10000000.000000
llama_model_loader: - kv 25: qwen35.attention.layer_norm_rms_epsilon f32 = 0.000001
llama_model_loader: - kv 26: qwen35.attention.key_length u32 = 256
llama_model_loader: - kv 27: qwen35.attention.value_length u32 = 256
llama_model_loader: - kv 28: qwen35.ssm.conv_kernel u32 = 4
llama_model_loader: - kv 29: qwen35.ssm.state_size u32 = 128
llama_model_loader: - kv 30: qwen35.ssm.group_count u32 = 16
llama_model_loader: - kv 31: qwen35.ssm.time_step_rank u32 = 48
llama_model_loader: - kv 32: qwen35.ssm.inner_size u32 = 6144
llama_model_loader: - kv 33: qwen35.full_attention_interval u32 = 4
llama_model_loader: - kv 34: qwen35.rope.dimension_count u32 = 64
llama_model_loader: - kv 35: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 36: tokenizer.ggml.pre str = qwen35
llama_model_loader: - kv 37: tokenizer.ggml.tokens arr[str,248320] = ["!", """, "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 38: tokenizer.ggml.token_type arr[i32,248320] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 39: tokenizer.ggml.merges arr[str,247587] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv 40: tokenizer.ggml.eos_token_id u32 = 248046
llama_model_loader: - kv 41: tokenizer.ggml.padding_token_id u32 = 248055
llama_model_loader: - kv 42: tokenizer.chat_template str = {%- set image_count = namespace(value...
llama_model_loader: - kv 43: general.quantization_version u32 = 2
llama_model_loader: - kv 44: general.file_type u32 = 7
llama_model_loader: - kv 45: quantize.imatrix.file str = Qwen3.5-27B-GGUF/imatrix_unsloth.gguf
llama_model_loader: - kv 46: quantize.imatrix.dataset str = unsloth_calibration_Qwen3.5-27B.txt
llama_model_loader: - kv 47: quantize.imatrix.entries_count u32 = 496
llama_model_loader: - kv 48: quantize.imatrix.chunks_count u32 = 80
llama_model_loader: - type f32: 353 tensors
llama_model_loader: - type f16: 218 tensors
llama_model_loader: - type q8_0: 280 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type = Q8_0
print_info: file size = 33.08 GiB (10.56 BPW)
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35'
[0mllama_model_load_from_file_impl: failed to load model
[0mllama_params_fit: encountered an error while trying to fit params to free device memory: failed to load model
[0mllama_params_fit: fitting params to free memory took 0.08 seconds
llama_model_load_from_file_impl: using device Vulkan0 (Intel(R) Iris(R) Xe Graphics (ADL GT2)) (0000:00:02.0) - 43293 MiB free
llama_model_loader: loaded meta data with 49 key-value pairs and 851 tensors from Qwen3.5-27B-UD-Q8_K_XL.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = qwen35
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.sampling.top_k i32 = 20
llama_model_loader: - kv 3: general.sampling.top_p f32 = 0.950000
llama_model_loader: - kv 4: general.sampling.temp f32 = 0.600000
llama_model_loader: - kv 5: general.name str = Qwen3.5-27B
llama_model_loader: - kv 6: general.basename str = Qwen3.5-27B
llama_model_loader: - kv 7: general.quantized_by str = Unsloth
llama_model_loader: - kv 8: general.size_label str = 27B
llama_model_loader: - kv 9: general.license str = apache-2.0
llama_model_loader: - kv 10: general.license.link str = https://huggingface.co/Qwen/Qwen3.5-2...
llama_model_loader: - kv 11: general.repo_url str = https://huggingface.co/unsloth
llama_model_loader: - kv 12: general.base_model.count u32 = 1
llama_model_loader: - kv 13: general.base_model.0.name str = Qwen3.5 27B
llama_model_loader: - kv 14: general.base_model.0.organization str = Qwen
llama_model_loader: - kv 15: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen3.5-27B
llama_model_loader: - kv 16: general.tags arr[str,3] = ["qwen3_5_moe", "unsloth", "image-tex...
llama_model_loader: - kv 17: qwen35.block_count u32 = 64
llama_model_loader: - kv 18: qwen35.context_length u32 = 262144
llama_model_loader: - kv 19: qwen35.embedding_length u32 = 5120
llama_model_loader: - kv 20: qwen35.feed_forward_length u32 = 17408
llama_model_loader: - kv 21: qwen35.attention.head_count u32 = 24
llama_model_loader: - kv 22: qwen35.attention.head_count_kv u32 = 4
llama_model_loader: - kv 23: qwen35.rope.dimension_sections arr[i32,4] = [11, 11, 10, 0]
llama_model_loader: - kv 24: qwen35.rope.freq_base f32 = 10000000.000000
llama_model_loader: - kv 25: qwen35.attention.layer_norm_rms_epsilon f32 = 0.000001
llama_model_loader: - kv 26: qwen35.attention.key_length u32 = 256
llama_model_loader: - kv 27: qwen35.attention.value_length u32 = 256
llama_model_loader: - kv 28: qwen35.ssm.conv_kernel u32 = 4
llama_model_loader: - kv 29: qwen35.ssm.state_size u32 = 128
llama_model_loader: - kv 30: qwen35.ssm.group_count u32 = 16
llama_model_loader: - kv 31: qwen35.ssm.time_step_rank u32 = 48
llama_model_loader: - kv 32: qwen35.ssm.inner_size u32 = 6144
llama_model_loader: - kv 33: qwen35.full_attention_interval u32 = 4
llama_model_loader: - kv 34: qwen35.rope.dimension_count u32 = 64
llama_model_loader: - kv 35: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 36: tokenizer.ggml.pre str = qwen35
llama_model_loader: - kv 37: tokenizer.ggml.tokens arr[str,248320] = ["!", """, "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 38: tokenizer.ggml.token_type arr[i32,248320] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 39: tokenizer.ggml.merges arr[str,247587] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv 40: tokenizer.ggml.eos_token_id u32 = 248046
llama_model_loader: - kv 41: tokenizer.ggml.padding_token_id u32 = 248055
llama_model_loader: - kv 42: tokenizer.chat_template str = {%- set image_count = namespace(value...
llama_model_loader: - kv 43: general.quantization_version u32 = 2
llama_model_loader: - kv 44: general.file_type u32 = 7
llama_model_loader: - kv 45: quantize.imatrix.file str = Qwen3.5-27B-GGUF/imatrix_unsloth.gguf
llama_model_loader: - kv 46: quantize.imatrix.dataset str = unsloth_calibration_Qwen3.5-27B.txt
llama_model_loader: - kv 47: quantize.imatrix.entries_count u32 = 496
llama_model_loader: - kv 48: quantize.imatrix.chunks_count u32 = 80
llama_model_loader: - type f32: 353 tensors
llama_model_loader: - type f16: 218 tensors
llama_model_loader: - type q8_0: 280 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type = Q8_0
print_info: file size = 33.08 GiB (10.56 BPW)
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35'
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model 'Qwen3.5-27B-UD-Q8_K_XL.gguf'
srv load_model: failed to load model, 'Qwen3.5-27B-UD-Q8_K_XL.gguf'
Solved with 8233
Hi, I'm facing the same issue when use it with vllm.
Can you briefly explain how you solve it? really preciate!
Hi, I'm facing the same issue when use it with vllm.
Can you briefly explain how you solve it? really preciate!
update to newer llamacpp version