Why is the size of pruned model bigger than the original ones after 24 layers been sliced?
#1
by iheardyoulooking - opened
Usually after structured pruning the model size should be smaller. but
the original one: 15GB
sliced one: 20GB+
@iheardyoulooking it's because the model has been uploaded in 32 bit float format where the original Mistral is bfloat16. That makes each param in the sliced version twice as big on disk
You can still load the model in 16 bit by passing a torch_dtypeargument
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('arcee-ai/Mistral-7B-Instruct-v0.2-sliced-24-layer')
model = AutoModelForCausalLM.from_pretrained(
'arcee-ai/Mistral-7B-Instruct-v0.2-sliced-24-layer',
torch_dtype=torch.bfloat16
)
Shamane changed discussion status to closed
Shamane changed discussion status to open
No description provided.
Shamane changed discussion status to closed