Tokenizer cannot be loaded with transformers>5

#3
by ygorg - opened

It might be because tokenizer_config.json["tokenizer_class"] is CamembertTokenizer.
But CamembertTokenizer is a Unigram sentencepiece, but DrBert's tokenizer is BPE.
When changing tokenizer_class to "PreTrainedTokenizer", loading is successful. More test should be run to see if it tokenizes differently.

Sign up or log in to comment