KeyError when trying to use the model

#3
by mischamole - opened

I am trying to follow the instructions on huggingface to load the model:

from transformers import AutoTokenizer, AutoModelForVision2Seq

tokenizer = AutoTokenizer.from_pretrained("dh-unibe/trocr-kurrent")
model = AutoModelForVision2Seq.from_pretrained("dh-unibe/trocr-kurrent")

But I always get

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/processing_auto.py", line 264, in from_pretrained
    return processor_class.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/processing_utils.py", line 184, in from_pretrained
    args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/processing_utils.py", line 228, in _get_arguments_from_pretrained
    args.append(attribute_class.from_pretrained(pretrained_model_name_or_path, **kwargs))
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py", line 674, in from_pretrained
    tokenizer_class_py, tokenizer_class_fast = TOKENIZER_MAPPING[type(config)]
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 599, in __getitem__
    raise KeyError(key)

KeyError: <class 'transformers.models.vision_encoder_decoder.configuration_vision_encoder_decoder.VisionEncoderDecoderConfig'>

Am i doing something wrong?

Digital Humanities @ University of Bern org
edited Aug 29, 2025

Sorry it does not work - it's not your fault!
It's a problem of version. The tokenizer/processor part of this model is too old.
Files are missing. I tried to add tokenizer_config.json, now vocab.json is still missing.
And I'm not sure, if makes sense to copy this from another model (like dh-unibe/trocr-kurrent-XVI-XVII).

Also downgrading transformers(the version, this model has been trained with is 4.26.0), torch, or python did not work.

You can try to use the base-model (or the one from kurrent-XVI-XVII) processor. The VisionEncoderDecoderModel can be loaded.

from transformers import TrOCRProcessor, VisionEncoderDecoderModel

processor = TrOCRProcessor.from_pretrained("microsoft/trocr-base-handwritten")
model = VisionEncoderDecoderModel.from_pretrained("dh-unibe/trocr-kurrent")

Tbh. I'm not sure about the influence of the inference outcome.

For sure we'll have to train a newer version with the training material we have.

Thanks for your reply!

Yeah, I figured that it was an issue with the library versions, I tried transformers 4.19, 4.26 and 4.55, with the same results.

I will take a look at your proposed workaround this weekend, maybe the results are good enough for me!

Sign up or log in to comment