'blk.3.attn_v.weight' has invalid ggml type 140

#1
by cpuQ - opened

i am unable to load the model using llamacpp

gguf_init_from_file_ptr: tensor 'blk.3.attn_v.weight' has invalid ggml type 140. should be in [0, 42)

Thanks for reporting it. I don’t think this is a generic load failure — the specific error is the important part:

gguf_init_from_file_ptr: tensor 'blk.3.attn_v.weight' has invalid ggml type 140. should be in [0, 42)

That strongly suggests the file contains a quant type your backend does not recognize. In current testing, ggml type 140 maps to an ik-llama-specific quant type (IQ5_K), and attn_v.weight being stored that way is consistent with ik-llama quant mixes.

So the most likely causes are:

  1. you are loading it with mainline llama.cpp instead of ik-llama.cpp
  2. your frontend/UI bundles an older or incompatible llama.cpp backend
  3. you are on a build that does not yet support the quant types used in this GGUF

A few things that would help narrow it down:

  • exact backend (llama.cpp, ik-llama.cpp, KoboldCPP, LM Studio, etc.)
  • exact commit/build date
  • which file you tried to load (BF16 or IQ4_NL)

My first recommendation would be to try the latest ik-llama.cpp build first. If you are already doing that, send your exact build info and I’ll dig deeper.

Thanks for reporting this — I tracked it down properly and there is now a separate standard llama.cpp-compatible release on the repo.

New files

  • RYS-Qwen3.5-27B-Uncensored-Splice-IQ4_NL-llama.cpp-compatible.gguf
  • RYS-Qwen3.5-27B-Uncensored-Splice-IQ4_NL-ik-llama.gguf

What was happening

The original custom IQ4_NL build was the author's ik-llama.cpp driver quant and included a mixed tensor layout that standard llama.cpp rejected with:

gguf_init_from_file_ptr: tensor 'blk.3.attn_v.weight' has invalid ggml type 140

So this was not just a generic prompt/settings issue. It was a file compatibility difference between the ik-llama-oriented quant and standard llama.cpp.

What I changed

I rebuilt a separate IQ4_NL directly with standard llama.cpp llama-quantize from the same BF16 splice source using the same custom imatrix/calibration.

Exact standard llama.cpp build tested

  • binary: /home/benbi/llama.cpp/build/bin/llama-server
  • quantizer: /home/benbi/llama.cpp/build/bin/llama-quantize
  • build string: build: 8401 (a69d54f99) with GNU 15.2.1 for Linux x86_64
  • CUDA archs reported: ARCHS = 860,1200

How it was tested

The new compatible build was loaded successfully in standard llama.cpp and verified with:

  • CUDA_VISIBLE_DEVICES=0,1,2
  • --tensor-split 3,2,2
  • -c 262144
  • --cache-type-k f32 --cache-type-v f32

Verification prompts returned correct final answers, including:

  • 0.05
  • 5050
  • exact instruction-following output

Recommendation

  • If you use standard llama.cpp, use RYS-Qwen3.5-27B-Uncensored-Splice-IQ4_NL-llama.cpp-compatible.gguf
  • If you use ik-llama.cpp, use RYS-Qwen3.5-27B-Uncensored-Splice-IQ4_NL-ik-llama.gguf

The author still personally recommends the ik-llama one because that is the actual daily-driver quant and the one that will get the most real-world testing going forward.

thank you for the clarification and fix! im so sorry about my question being vague. i was trying to load the IQ4_NL variant. its working fine on my end now, llamacpp b8733

cpuQ changed discussion status to closed

Sign up or log in to comment