This model has a bug. RuntimeError: contraction dimension mismatch during Fine-Tuning (Corrupted MoE Safetensor Shapes)

by christian1980nrw - opened Mar 3

christian1980nrw

Mar 3

•

When attempting to natively fine-tune or run inference on this model using PyTorch, Transformers, or Axolotl, the process immediately crashes during the first forward pass with the following error inside torch._grouped_mm:

RuntimeError: contraction dimension of mat_a and mat_b must match

Root Cause: The expert weights inside the provided .safetensors files (gate_up_proj and down_proj) have been permanently transposed and uploaded with incorrect shapes.

The current gate_up_proj shape in this repo is [128, 2048, 1536].
Natively, the Qwen3VLMoeForCausalLM architecture expects [128, 1536, 2048].
This modification was likely done by a pre-processing script prior to upload as a workaround to satisfy an older, legacy bug in the llama.cpp GGUF converter (which crashed on native PyTorch shapes).

Impact: Because the matrix dimensions no longer align with the official Qwen architecture definitions, Native Inference, LoRA, and Full Fine-Tuning are completely broken out of the box. PyTorch allocates memory for [128, 1536, 2048], reads [128, 2048, 1536] from disk, and crashes during matrix multiplication.

The Complete Solution (Two-Part Fix):

Part 1: Fixing the Model for PyTorch / Axolotl Training Before starting your training script, you must physically revert the expert matrices back to their original PyTorch shapes ([n_expert, 2*n_ff, n_embd]). You can run this quick Python auto-repair script over the downloaded model directory to fix the weights on disk instantly:

python
import glob
from safetensors.torch import load_file, save_file
for f in glob.glob("path/to/model/*.safetensors"):
tensors = load_file(f)
changed = False
for k, v in list(tensors.items()):
if 'experts.gate_up_proj' in k and v.shape == (128, 2048, 1536):
tensors[k] = v.transpose(1, 2).contiguous()
changed = True
elif 'experts.down_proj' in k and v.shape == (128, 768, 2048):
tensors[k] = v.transpose(1, 2).contiguous()
changed = True
if changed:
print(f"Repairing transposed weights in {f}...")
save_file(tensors, f)

Part 2: Fixing the Llama.cpp GGUF Export Downstream If you fix the PyTorch training shapes (as described above) but still want to export the fine-tuned model to GGUF later, the default llama.cpp converter script might misbehave, as it sometimes still expects the broken/legacy transposed layout.

To solve this downstream issue,i have implemented an Auto-Transpose Auto-Repair feature directly into the llama.cpp pipeline. My patched convert_hf_to_gguf.py automatically detects whether the incoming HuggingFace tensors are transposed or in native PyTorch format, and perfectly aligns them on-the-fly during the GGUF conversion.

You can grab my patched converter (or clone the fork) directly here to ensure flawless GGUF exports of Qwen3VLMoe models: 👉 butterwecksolutions/llama.cpp - convert_hf_to_gguf.py (Auto-Transpose Patch).

Note / status: still testing (i am surprised to be the first one trying to finetune this)

Regards!
Christian

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment