KeyError when trying to use the model

by mischamole - opened Aug 28, 2025

Aug 28, 2025

I am trying to follow the instructions on huggingface to load the model:

from transformers import AutoTokenizer, AutoModelForVision2Seq

tokenizer = AutoTokenizer.from_pretrained("dh-unibe/trocr-kurrent")
model = AutoModelForVision2Seq.from_pretrained("dh-unibe/trocr-kurrent")

But I always get

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/processing_auto.py", line 264, in from_pretrained
    return processor_class.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/processing_utils.py", line 184, in from_pretrained
    args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/processing_utils.py", line 228, in _get_arguments_from_pretrained
    args.append(attribute_class.from_pretrained(pretrained_model_name_or_path, **kwargs))
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py", line 674, in from_pretrained
    tokenizer_class_py, tokenizer_class_fast = TOKENIZER_MAPPING[type(config)]
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 599, in __getitem__
    raise KeyError(key)

KeyError: <class 'transformers.models.vision_encoder_decoder.configuration_vision_encoder_decoder.VisionEncoderDecoderConfig'>

Am i doing something wrong?

jwidmer

Digital Humanities @ University of Bern org Aug 29, 2025

•

edited Aug 29, 2025

Sorry it does not work - it's not your fault!
It's a problem of version. The tokenizer/processor part of this model is too old.
Files are missing. I tried to add tokenizer_config.json, now vocab.json is still missing.
And I'm not sure, if makes sense to copy this from another model (like dh-unibe/trocr-kurrent-XVI-XVII).

Also downgrading transformers(the version, this model has been trained with is 4.26.0), torch, or python did not work.

You can try to use the base-model (or the one from kurrent-XVI-XVII) processor. The VisionEncoderDecoderModel can be loaded.

from transformers import TrOCRProcessor, VisionEncoderDecoderModel

processor = TrOCRProcessor.from_pretrained("microsoft/trocr-base-handwritten")
model = VisionEncoderDecoderModel.from_pretrained("dh-unibe/trocr-kurrent")

Tbh. I'm not sure about the influence of the inference outcome.

For sure we'll have to train a newer version with the training material we have.

mischamole

Aug 29, 2025

•

edited Aug 29, 2025

Thanks for your reply!

Yeah, I figured that it was an issue with the library versions, I tried transformers 4.19, 4.26 and 4.55, with the same results.

I will take a look at your proposed workaround this weekend, maybe the results are good enough for me!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment