Using 'AutoModelForCausalLM' leads to size mismatch

#122
by saehee - opened

I have transformers==4.43.4 installed, and I am trying to load this model using AutoModelForCausalLM (which leads to using LLamaForCausalLM)
But this error keeps popping up:

RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM:
size mismatch for model.layers.0.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.0.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.1.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.1.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.2.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.2.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.3.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.3.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
...
size mismatch for model.layers.31.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for model.layers.31.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
You may consider adding ignore_mismatched_sizes=True in the model from_pretrained method.

It seems like the key and value matrices are mismatching, but I do not know what I can do to resolve this issue.
It would help me out a lot if there are any possible solutions.

Thanks in advance.

Issue resolved: I used transformers==4.43.1

The same issue with the Llama-3.1-8B model was also resolved.
I think the transformers version needs an update on both the config.json & README for both the models

saehee changed discussion status to closed

Sign up or log in to comment