'blk.3.attn_v.weight' has invalid ggml type 140

by cpuQ - opened 12 days ago

i am unable to load the model using llamacpp

gguf_init_from_file_ptr: tensor 'blk.3.attn_v.weight' has invalid ggml type 140. should be in [0, 42)

jackasda211233

Owner 11 days ago

Thanks for reporting it. I don’t think this is a generic load failure — the specific error is the important part:

gguf_init_from_file_ptr: tensor 'blk.3.attn_v.weight' has invalid ggml type 140. should be in [0, 42)

That strongly suggests the file contains a quant type your backend does not recognize. In current testing, ggml type 140 maps to an ik-llama-specific quant type (IQ5_K), and attn_v.weight being stored that way is consistent with ik-llama quant mixes.

So the most likely causes are:

you are loading it with mainline llama.cpp instead of ik-llama.cpp
your frontend/UI bundles an older or incompatible llama.cpp backend
you are on a build that does not yet support the quant types used in this GGUF

A few things that would help narrow it down:

exact backend (llama.cpp, ik-llama.cpp, KoboldCPP, LM Studio, etc.)
exact commit/build date
which file you tried to load (BF16 or IQ4_NL)

My first recommendation would be to try the latest ik-llama.cpp build first. If you are already doing that, send your exact build info and I’ll dig deeper.

jackasda211233

Owner 10 days ago

Thanks for reporting this — I tracked it down properly and there is now a separate standard llama.cpp-compatible release on the repo.

New files

RYS-Qwen3.5-27B-Uncensored-Splice-IQ4_NL-llama.cpp-compatible.gguf
RYS-Qwen3.5-27B-Uncensored-Splice-IQ4_NL-ik-llama.gguf

What was happening

The original custom IQ4_NL build was the author's ik-llama.cpp driver quant and included a mixed tensor layout that standard llama.cpp rejected with:

gguf_init_from_file_ptr: tensor 'blk.3.attn_v.weight' has invalid ggml type 140

So this was not just a generic prompt/settings issue. It was a file compatibility difference between the ik-llama-oriented quant and standard llama.cpp.

What I changed

I rebuilt a separate IQ4_NL directly with standard llama.cpp llama-quantize from the same BF16 splice source using the same custom imatrix/calibration.

Exact standard llama.cpp build tested

binary: /home/benbi/llama.cpp/build/bin/llama-server
quantizer: /home/benbi/llama.cpp/build/bin/llama-quantize
build string: build: 8401 (a69d54f99) with GNU 15.2.1 for Linux x86_64
CUDA archs reported: ARCHS = 860,1200

How it was tested

The new compatible build was loaded successfully in standard llama.cpp and verified with:

CUDA_VISIBLE_DEVICES=0,1,2
--tensor-split 3,2,2
-c 262144
--cache-type-k f32 --cache-type-v f32

Verification prompts returned correct final answers, including:

0.05
5050
exact instruction-following output

Recommendation

If you use standard llama.cpp, use RYS-Qwen3.5-27B-Uncensored-Splice-IQ4_NL-llama.cpp-compatible.gguf
If you use ik-llama.cpp, use RYS-Qwen3.5-27B-Uncensored-Splice-IQ4_NL-ik-llama.gguf

The author still personally recommends the ik-llama one because that is the actual daily-driver quant and the one that will get the most real-world testing going forward.

cpuQ

10 days ago

thank you for the clarification and fix! im so sorry about my question being vague. i was trying to load the IQ4_NL variant. its working fine on my end now, llamacpp b8733

cpuQ changed discussion status to closed 10 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment