"safetensors_rust.SafetensorError: Error while deserializing header: incomplete metadata, file not fully covered"

by ChengqianMa - opened Aug 28, 2025

Aug 28, 2025

Your model is good, but I met a problem:
I downloaded the Llama-3.3-70B-Instruct repository to a local path, and replaced the text_model_id in the config.json of this repository with the local path of ultravox-v0_6-llama-3_3-70b.
When I run the Python script as follows:

# pip install transformers peft librosa

import transformers
import numpy as np
import librosa

pipe = transformers.pipeline(model='path/to/local/ultravox-v0_6-llama-3_1-8b', trust_remote_code=True)

path = "<path-to-input-audio>"  # TODO: pass the audio here
audio, sr = librosa.load(path, sr=16000)


turns = [
  {
    "role": "system",
    "content": "You are a friendly and helpful character. You love to answer questions for people."
  },
]
pipe({'audio': audio, 'turns': turns, 'sampling_rate': sr}, max_new_tokens=30)

I met the error:

Loading checkpoint shards:   0%|                                                 | 0/30 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/data/chengqianma/test_ultravox.py", line 8, in <module>
    pipe = transformers.pipeline(model='/data/chengqianma/PTM/ultravox-v0_6-llama-3_3-70b', trust_remote_code=True)
  File "/home/chengqianma/.local/lib/python3.10/site-packages/transformers/pipelines/__init__.py", line 1008, in pipeline
    framework, model = infer_framework_load_model(
  File "/home/chengqianma/.local/lib/python3.10/site-packages/transformers/pipelines/base.py", line 292, in infer_framework_load_model
    model = model_class.from_pretrained(model, **kwargs)
  File "/home/chengqianma/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 593, in from_pretrained
    return model_class.from_pretrained(
  File "/home/chengqianma/.cache/huggingface/modules/transformers_modules/ultravox-v0_6-llama-3_3-70b/ultravox_model.py", line 103, in from_pretrained
    model = super().from_pretrained(*args, **kwargs)
  File "/home/chengqianma/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 317, in _wrapper
    return func(*args, **kwargs)
  File "/home/chengqianma/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 5069, in from_pretrained
    ) = cls._load_pretrained_model(
  File "/home/chengqianma/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 5386, in _load_pretrained_model
    model._initialize_missing_keys(checkpoint_keys, ignore_mismatched_sizes, is_quantized)
  File "/home/chengqianma/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 5958, in _initialize_missing_keys
    self.initialize_weights()
  File "/home/chengqianma/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
  File "/home/chengqianma/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2997, in initialize_weights
    self.smart_apply(self._initialize_weights)
  File "/home/chengqianma/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2991, in smart_apply
    fn(self)
  File "/home/chengqianma/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2965, in _initialize_weights
    self._init_weights(module)
  File "/home/chengqianma/.cache/huggingface/modules/transformers_modules/ultravox-v0_6-llama-3_3-70b/ultravox_model.py", line 87, in _init_weights
    self.language_model = self._create_language_model(self.config)
  File "/home/chengqianma/.cache/huggingface/modules/transformers_modules/ultravox-v0_6-llama-3_3-70b/ultravox_model.py", line 499, in _create_language_model
    language_model = transformers.AutoModelForCausalLM.from_pretrained(
  File "/home/chengqianma/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 600, in from_pretrained
    return model_class.from_pretrained(
  File "/home/chengqianma/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 317, in _wrapper
    return func(*args, **kwargs)
  File "/home/chengqianma/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 5069, in from_pretrained
    ) = cls._load_pretrained_model(
  File "/home/chengqianma/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 5532, in _load_pretrained_model
    _error_msgs, disk_offload_index, cpu_offload_index = load_shard_file(args)
  File "/home/chengqianma/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 962, in load_shard_file
    state_dict = load_state_dict(
  File "/home/chengqianma/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 538, in load_state_dict
    with safe_open(checkpoint_file, framework="pt") as f:
safetensors_rust.SafetensorError: Error while deserializing header: incomplete metadata, file not fully covered

My machine information is:

NVIDIA H20 * 8
NVIDIA-SMI 550.54.14 
Driver Version: 550.54.14
CUDA Version: 12.4

Does anyone meet this problem as well?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment