trocr-base-printed-onnx
This repository contains an optimized ONNX version of the microsoft/trocr-base-printed model.
It was exported using the optimum library for faster inference on CPU and GPU.
Model Details
- Architecture: Vision Encoder-Decoder (ViT encoder + RoBERTa decoder)
- Format: ONNX (Encoder, Decoder, and Decoder with Past Key Values)
- Task: Optical Character Recognition (OCR) for printed text.
Usage
from optimum.onnxruntime import ORTModelForVision2Seq
from transformers import TrOCRProcessor
import torch
model_id = "KvaytG/trocr-base-printed-onnx"
providers = ["CUDAExecutionProvider", "CPUExecutionProvider"] if torch.cuda.is_available() else ["CPUExecutionProvider"]
processor = TrOCRProcessor.from_pretrained(model_id)
model = ORTModelForVision2Seq.from_pretrained(
model_id,
providers=providers,
decoder_file_name="decoder_model.onnx",
decoder_with_past_file_name="decoder_with_past_model.onnx"
)
inputs = processor(images=image, return_tensors="pt").to(device)
generated_ids = model.generate(
inputs.pixel_values,
use_cache=True
)
text = processor.batch_decode(generated_ids, skip_special_tokens=True)
print(text)
Credits
Original model by Microsoft.
License
Licensed under the MIT license.
- Downloads last month
- 78