error loading model architecture: unknown model architecture: 'qwen35moe'
llama_model_loader: additional 4 GGUFs metadata loaded.
llama_model_loader: loaded meta data with 50 key-value pairs and 1098 tensors from /media/sherry/SeagateAI/ubergarm/Qwen3.5-397B-A17B-GGUF/Q3_K/Qwen3.5-397B-A17B-Q3_K-00001-of-00005.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = qwen35moe
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.sampling.top_k i32 = 20
llama_model_loader: - kv 3: general.sampling.top_p f32 = 0.950000
llama_model_loader: - kv 4: general.sampling.temp f32 = 0.600000
llama_model_loader: - kv 5: general.name str = Qwen3.5 397B A17B
llama_model_loader: - kv 6: general.basename str = Qwen3.5
llama_model_loader: - kv 7: general.size_label str = 397B-A17B
llama_model_loader: - kv 8: general.license str = apache-2.0
llama_model_loader: - kv 9: general.license.link str = https://huggingface.co/Qwen/Qwen3.5-3...
llama_model_loader: - kv 10: general.tags arr[str,1] = ["image-text-to-text"]
llama_model_loader: - kv 11: qwen35moe.block_count u32 = 60
llama_model_loader: - kv 12: qwen35moe.context_length u32 = 262144
llama_model_loader: - kv 13: qwen35moe.embedding_length u32 = 4096
llama_model_loader: - kv 14: qwen35moe.attention.head_count u32 = 32
llama_model_loader: - kv 15: qwen35moe.attention.head_count_kv u32 = 2
llama_model_loader: - kv 16: qwen35moe.rope.dimension_sections arr[i32,4] = [11, 11, 10, 0]
llama_model_loader: - kv 17: qwen35moe.rope.freq_base f32 = 10000000.000000
llama_model_loader: - kv 18: qwen35moe.attention.layer_norm_rms_epsilon f32 = 0.000001
llama_model_loader: - kv 19: qwen35moe.expert_count u32 = 512
llama_model_loader: - kv 20: qwen35moe.expert_used_count u32 = 10
llama_model_loader: - kv 21: qwen35moe.attention.key_length u32 = 256
llama_model_loader: - kv 22: qwen35moe.attention.value_length u32 = 256
llama_model_loader: - kv 23: qwen35moe.expert_feed_forward_length u32 = 1024
llama_model_loader: - kv 24: qwen35moe.expert_shared_feed_forward_length u32 = 1024
llama_model_loader: - kv 25: qwen35moe.ssm.conv_kernel u32 = 4
llama_model_loader: - kv 26: qwen35moe.ssm.state_size u32 = 128
llama_model_loader: - kv 27: qwen35moe.ssm.group_count u32 = 16
llama_model_loader: - kv 28: qwen35moe.ssm.time_step_rank u32 = 64
llama_model_loader: - kv 29: qwen35moe.ssm.inner_size u32 = 8192
llama_model_loader: - kv 30: qwen35moe.full_attention_interval u32 = 4
llama_model_loader: - kv 31: qwen35moe.rope.dimension_count u32 = 64
llama_model_loader: - kv 32: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 33: tokenizer.ggml.pre str = qwen35
llama_model_loader: - kv 34: tokenizer.ggml.tokens arr[str,248320] = ["!", """, "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 35: tokenizer.ggml.token_type arr[i32,248320] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 36: tokenizer.ggml.merges arr[str,247587] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv 37: tokenizer.ggml.eos_token_id u32 = 248046
llama_model_loader: - kv 38: tokenizer.ggml.padding_token_id u32 = 248044
llama_model_loader: - kv 39: tokenizer.ggml.add_bos_token bool = false
llama_model_loader: - kv 40: tokenizer.chat_template str = {%- set image_count = namespace(value...
llama_model_loader: - kv 41: general.quantization_version u32 = 2
llama_model_loader: - kv 42: general.file_type u32 = 7
llama_model_loader: - kv 43: quantize.imatrix.file str = /mnt/data/models/ubergarm/Qwen3.5-397...
llama_model_loader: - kv 44: quantize.imatrix.dataset str = ubergarm-imatrix-calibration-corpus-v...
llama_model_loader: - kv 45: quantize.imatrix.entries_count u32 = 765
llama_model_loader: - kv 46: quantize.imatrix.chunks_count u32 = 829
llama_model_loader: - kv 47: split.no u16 = 0
llama_model_loader: - kv 48: split.tensors.count i32 = 1098
llama_model_loader: - kv 49: split.count u16 = 5
llama_model_loader: - type f32: 451 tensors
llama_model_loader: - type q8_0: 465 tensors
llama_model_loader: - type q3_K: 120 tensors
llama_model_loader: - type q4_K: 61 tensors
llama_model_loader: - type q6_K: 1 tensors
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35moe'
llama_model_load_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model '/media/sherry/SeagateAI/ubergarm/Qwen3.5-397B-A17B-GGUF/Q3_K/Qwen3.5-397B-A17B-Q3_K-00001-of-00005.gguf'
No support in ik_llama.cpp yet, follow here: https://github.com/ikawrakow/ik_llama.cpp/issues/1255
For now you can use mainline with autoparser branch like so:
cd projects
git clone --depth 1 -b autoparser git@github.com:pwilkin/llama.cpp.git
cd llama.cpp
# compile and run