No proper tokenizer for 'switch-xxl-128'

by VIArchitect - opened Dec 26, 2022

Discussion

VIArchitect

Dec 26, 2022

Hello.

For example code execution,

It seems that there's no proper tokenizer for 'switch-xxl-128'

It shows an error below, and I used the example code in the model card with full-precision.

Traceback (most recent call last):
  File "example.py", line 4, in <module>
    tokenizer = AutoTokenizer.from_pretrained(
  File "/root/hugging_face/hf/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 658, in from_pretrained
    return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
  File "/root/hugging_face/hf/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1761, in from_pretrained
    raise EnvironmentError(
OSError: Can't load tokenizer for 'google/switch-xxl-128'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'google/switch-xxl-128' is the correct path to a directory containing all relevant files for a T5TokenizerFast tokenizer.

ybelkada

Dec 26, 2022

Thanks for pointing out the issue, the tokenizer was indeed missing, I have just uploaded it now!

VIArchitect

Dec 26, 2022

Thanks, @ybelkada !

But it seems not to be working well...

TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates

ybelkada

Dec 26, 2022

I just tried to download the tokenizer with the following:

from transformers import AutoTokenizer

tok = AutoTokenizer.from_pretrained("google/switch-xxl-128")

And seems to work fine
could you share the full error traceback?

VIArchitect

Dec 26, 2022

Ok, this is the full error traceback

Traceback (most recent call last):
  File "example.py", line 4, in <module>
    tokenizer = AutoTokenizer.from_pretrained("google/switch-xxl-128", resume_download=True)
  File "/root/hugging_face/hf/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 640, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
  File "/root/hugging_face/hf/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1777, in from_pretrained
    return cls._from_pretrained(
  File "/root/hugging_face/hf/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1932, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/root/hugging_face/hf/lib/python3.8/site-packages/transformers/models/t5/tokenization_t5_fast.py", line 134, in __init__
    super().__init__(
  File "/root/hugging_face/hf/lib/python3.8/site-packages/transformers/tokenization_utils_fast.py", line 114, in __init__
    fast_tokenizer = convert_slow_tokenizer(slow_tokenizer)
  File "/root/hugging_face/hf/lib/python3.8/site-packages/transformers/convert_slow_tokenizer.py", line 1162, in convert_slow_tokenizer
    return converter_class(transformer_tokenizer).converted()
  File "/root/hugging_face/hf/lib/python3.8/site-packages/transformers/convert_slow_tokenizer.py", line 438, in __init__
    from .utils import sentencepiece_model_pb2 as model_pb2
  File "/root/hugging_face/hf/lib/python3.8/site-packages/transformers/utils/sentencepiece_model_pb2.py", line 92, in <module>
    _descriptor.EnumValueDescriptor(
  File "/root/hugging_face/hf/lib/python3.8/site-packages/google/protobuf/descriptor.py", line 755, in __new__
    _message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates

ybelkada

Dec 26, 2022

Thanks!
Do you get the same error without resume_download?

VIArchitect

Dec 26, 2022

Yes, I got the same error without resume_download:(

VIArchitect

Dec 27, 2022

pip install --upgrade protobuf==3.20.0
This solves this problem.

VIArchitect changed discussion status to closed Dec 27, 2022

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment