'blk.3.attn_v.weight' has invalid ggml type 140
i am unable to load the model using llamacpp
gguf_init_from_file_ptr: tensor 'blk.3.attn_v.weight' has invalid ggml type 140. should be in [0, 42)
Thanks for reporting it. I don’t think this is a generic load failure — the specific error is the important part:
gguf_init_from_file_ptr: tensor 'blk.3.attn_v.weight' has invalid ggml type 140. should be in [0, 42)
That strongly suggests the file contains a quant type your backend does not recognize. In current testing, ggml type 140 maps to an ik-llama-specific quant type (IQ5_K), and attn_v.weight being stored that way is consistent with ik-llama quant mixes.
So the most likely causes are:
- you are loading it with mainline llama.cpp instead of ik-llama.cpp
- your frontend/UI bundles an older or incompatible llama.cpp backend
- you are on a build that does not yet support the quant types used in this GGUF
A few things that would help narrow it down:
- exact backend (
llama.cpp,ik-llama.cpp, KoboldCPP, LM Studio, etc.) - exact commit/build date
- which file you tried to load (
BF16orIQ4_NL)
My first recommendation would be to try the latest ik-llama.cpp build first. If you are already doing that, send your exact build info and I’ll dig deeper.
Thanks for reporting this — I tracked it down properly and there is now a separate standard llama.cpp-compatible release on the repo.
New files
RYS-Qwen3.5-27B-Uncensored-Splice-IQ4_NL-llama.cpp-compatible.ggufRYS-Qwen3.5-27B-Uncensored-Splice-IQ4_NL-ik-llama.gguf
What was happening
The original custom IQ4_NL build was the author's ik-llama.cpp driver quant and included a mixed tensor layout that standard llama.cpp rejected with:
gguf_init_from_file_ptr: tensor 'blk.3.attn_v.weight' has invalid ggml type 140
So this was not just a generic prompt/settings issue. It was a file compatibility difference between the ik-llama-oriented quant and standard llama.cpp.
What I changed
I rebuilt a separate IQ4_NL directly with standard llama.cpp llama-quantize from the same BF16 splice source using the same custom imatrix/calibration.
Exact standard llama.cpp build tested
- binary:
/home/benbi/llama.cpp/build/bin/llama-server - quantizer:
/home/benbi/llama.cpp/build/bin/llama-quantize - build string:
build: 8401 (a69d54f99) with GNU 15.2.1 for Linux x86_64 - CUDA archs reported:
ARCHS = 860,1200
How it was tested
The new compatible build was loaded successfully in standard llama.cpp and verified with:
CUDA_VISIBLE_DEVICES=0,1,2--tensor-split 3,2,2-c 262144--cache-type-k f32 --cache-type-v f32
Verification prompts returned correct final answers, including:
0.055050- exact instruction-following output
Recommendation
- If you use standard llama.cpp, use
RYS-Qwen3.5-27B-Uncensored-Splice-IQ4_NL-llama.cpp-compatible.gguf - If you use ik-llama.cpp, use
RYS-Qwen3.5-27B-Uncensored-Splice-IQ4_NL-ik-llama.gguf
The author still personally recommends the ik-llama one because that is the actual daily-driver quant and the one that will get the most real-world testing going forward.
thank you for the clarification and fix! im so sorry about my question being vague. i was trying to load the IQ4_NL variant. its working fine on my end now, llamacpp b8733