trocr-base-printed-onnx

This repository contains an optimized ONNX version of the microsoft/trocr-base-printed model. It was exported using the optimum library for faster inference on CPU and GPU.

Model Details

Architecture: Vision Encoder-Decoder (ViT encoder + RoBERTa decoder)
Format: ONNX (Encoder, Decoder, and Decoder with Past Key Values)
Task: Optical Character Recognition (OCR) for printed text.

Usage

from optimum.onnxruntime import ORTModelForVision2Seq
from transformers import TrOCRProcessor
import torch

model_id = "KvaytG/trocr-base-printed-onnx"
providers = ["CUDAExecutionProvider", "CPUExecutionProvider"] if torch.cuda.is_available() else ["CPUExecutionProvider"]

processor = TrOCRProcessor.from_pretrained(model_id)
model = ORTModelForVision2Seq.from_pretrained(
    model_id,
    providers=providers,
    decoder_file_name="decoder_model.onnx",
    decoder_with_past_file_name="decoder_with_past_model.onnx"
)

inputs = processor(images=image, return_tensors="pt").to(device)
generated_ids = model.generate(
    inputs.pixel_values,
    use_cache=True
)
text = processor.batch_decode(generated_ids, skip_special_tokens=True)
print(text)

Credits

Original model by Microsoft.

License

Licensed under the MIT license.

Downloads last month: 78