Tokenizer cannot be loaded with transformers>5

by ygorg - opened Feb 13

•

It might be because tokenizer_config.json["tokenizer_class"] is CamembertTokenizer.
But CamembertTokenizer is a Unigram sentencepiece, but DrBert's tokenizer is BPE.
When changing tokenizer_class to "PreTrainedTokenizer", loading is successful. More test should be run to see if it tokenizes differently.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment