Broken model
#6
by vajje - opened
model is broken, wont run:
Error: 500 Internal Server Error: unable to load model: /usr/share/ollama/.ollama/models/blobs/sha256-d0ecd80b0e45b0d9e49c8cd1527b7f7d52d8d3bde2c569ab36aac59bb78f53f7
ollama-1 | llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
ollama-1 | llama_model_loader: - kv 0: general.architecture str = qwen35
ollama-1 | llama_model_loader: - kv 1: general.type str = model
ollama-1 | llama_model_loader: - kv 2: general.name str = Unsloth_Gguf_Yy57D7Hq
ollama-1 | llama_model_loader: - kv 3: general.quantized_by str = Unsloth
ollama-1 | llama_model_loader: - kv 4: general.size_label str = 9.0B
ollama-1 | llama_model_loader: - kv 5: general.repo_url str = https://huggingface.co/unsloth
ollama-1 | llama_model_loader: - kv 6: general.tags arr[str,2] = ["unsloth", "llama.cpp"]
ollama-1 | llama_model_loader: - kv 7: qwen35.block_count u32 = 32
ollama-1 | llama_model_loader: - kv 8: qwen35.context_length u32 = 262144
ollama-1 | llama_model_loader: - kv 9: qwen35.embedding_length u32 = 4096
ollama-1 | llama_model_loader: - kv 10: qwen35.feed_forward_length u32 = 12288
ollama-1 | llama_model_loader: - kv 11: qwen35.attention.head_count u32 = 16
ollama-1 | llama_model_loader: - kv 12: qwen35.attention.head_count_kv u32 = 4
ollama-1 | llama_model_loader: - kv 13: qwen35.rope.dimension_sections arr[i32,4] = [11, 11, 10, 0]
ollama-1 | llama_model_loader: - kv 14: qwen35.rope.freq_base f32 = 10000000.000000
ollama-1 | llama_model_loader: - kv 15: qwen35.attention.layer_norm_rms_epsilon f32 = 0.000001
ollama-1 | llama_model_loader: - kv 16: qwen35.attention.key_length u32 = 256
ollama-1 | llama_model_loader: - kv 17: qwen35.attention.value_length u32 = 256
ollama-1 | llama_model_loader: - kv 18: qwen35.ssm.conv_kernel u32 = 4
ollama-1 | llama_model_loader: - kv 19: qwen35.ssm.state_size u32 = 128
ollama-1 | llama_model_loader: - kv 20: qwen35.ssm.group_count u32 = 16
ollama-1 | llama_model_loader: - kv 21: qwen35.ssm.time_step_rank u32 = 32
ollama-1 | llama_model_loader: - kv 22: qwen35.ssm.inner_size u32 = 4096
ollama-1 | llama_model_loader: - kv 23: qwen35.full_attention_interval u32 = 4
ollama-1 | llama_model_loader: - kv 24: qwen35.rope.dimension_count u32 = 64
ollama-1 | llama_model_loader: - kv 25: tokenizer.ggml.model str = gpt2
ollama-1 | llama_model_loader: - kv 26: tokenizer.ggml.pre str = qwen35
ollama-1 | llama_model_loader: - kv 27: tokenizer.ggml.tokens arr[str,248320] = ["!", "\"", "#", "$", "%", "&", "'", ...
ollama-1 | llama_model_loader: - kv 28: tokenizer.ggml.token_type arr[i32,248320] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
ollama-1 | llama_model_loader: - kv 29: tokenizer.ggml.merges arr[str,247587] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
ollama-1 | llama_model_loader: - kv 30: tokenizer.ggml.eos_token_id u32 = 248046
ollama-1 | llama_model_loader: - kv 31: tokenizer.ggml.padding_token_id u32 = 248044
ollama-1 | llama_model_loader: - kv 32: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>...
ollama-1 | llama_model_loader: - kv 33: general.quantization_version u32 = 2
ollama-1 | llama_model_loader: - kv 34: general.file_type u32 = 15
ollama-1 | llama_model_loader: - type f32: 177 tensors
ollama-1 | llama_model_loader: - type q4_K: 204 tensors
ollama-1 | llama_model_loader: - type q5_K: 24 tensors
ollama-1 | llama_model_loader: - type q6_K: 22 tensors
ollama-1 | print_info: file format = GGUF V3 (latest)
ollama-1 | print_info: file type = Q4_K - Medium
ollama-1 | print_info: file size = 5.23 GiB (5.02 BPW)
ollama-1 | llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35'
I notice that some attr do not match the official qwen3.5
ollama-1 | llama_model_loader: - kv 12: qwen35.attention.head_count_kv u32 = 4
ollama-1 | llama_model_loader: - kv 13: qwen35.rope.dimension_sections arr[i32,4] = [11, 11, 10, 0]