This model has a bug. RuntimeError: contraction dimension mismatch during Fine-Tuning (Corrupted MoE Safetensor Shapes)

#4
by christian1980nrw - opened

When attempting to natively fine-tune or run inference on this model using PyTorch, Transformers, or Axolotl, the process immediately crashes during the first forward pass with the following error inside torch._grouped_mm:

RuntimeError: contraction dimension of mat_a and mat_b must match

Root Cause: The expert weights inside the provided .safetensors files (gate_up_proj and down_proj) have been permanently transposed and uploaded with incorrect shapes.

The current gate_up_proj shape in this repo is [128, 2048, 1536].
Natively, the Qwen3VLMoeForCausalLM architecture expects [128, 1536, 2048].
This modification was likely done by a pre-processing script prior to upload as a workaround to satisfy an older, legacy bug in the llama.cpp GGUF converter (which crashed on native PyTorch shapes).

Impact: Because the matrix dimensions no longer align with the official Qwen architecture definitions, Native Inference, LoRA, and Full Fine-Tuning are completely broken out of the box. PyTorch allocates memory for [128, 1536, 2048], reads [128, 2048, 1536] from disk, and crashes during matrix multiplication.

The Complete Solution (Two-Part Fix):

Part 1: Fixing the Model for PyTorch / Axolotl Training Before starting your training script, you must physically revert the expert matrices back to their original PyTorch shapes ([n_expert, 2*n_ff, n_embd]). You can run this quick Python auto-repair script over the downloaded model directory to fix the weights on disk instantly:

python
import glob
from safetensors.torch import load_file, save_file
for f in glob.glob("path/to/model/*.safetensors"):
tensors = load_file(f)
changed = False
for k, v in list(tensors.items()):
if 'experts.gate_up_proj' in k and v.shape == (128, 2048, 1536):
tensors[k] = v.transpose(1, 2).contiguous()
changed = True
elif 'experts.down_proj' in k and v.shape == (128, 768, 2048):
tensors[k] = v.transpose(1, 2).contiguous()
changed = True
if changed:
print(f"Repairing transposed weights in {f}...")
save_file(tensors, f)

Part 2: Fixing the Llama.cpp GGUF Export Downstream If you fix the PyTorch training shapes (as described above) but still want to export the fine-tuned model to GGUF later, the default llama.cpp converter script might misbehave, as it sometimes still expects the broken/legacy transposed layout.

To solve this downstream issue,i have implemented an Auto-Transpose Auto-Repair feature directly into the llama.cpp pipeline. My patched convert_hf_to_gguf.py automatically detects whether the incoming HuggingFace tensors are transposed or in native PyTorch format, and perfectly aligns them on-the-fly during the GGUF conversion.

You can grab my patched converter (or clone the fork) directly here to ensure flawless GGUF exports of Qwen3VLMoe models: πŸ‘‰ butterwecksolutions/llama.cpp - convert_hf_to_gguf.py (Auto-Transpose Patch).

Note / status: still testing (i am surprised to be the first one trying to finetune this)

Regards!
Christian

Sign up or log in to comment