Odia OCR – Qwen2.5-VL Fine-tuned

Fine-tuned vision-language model for Odia (Odisha script) OCR based on Qwen/Qwen2.5-VL-3B-Instruct.

Training details


Base model	Qwen/Qwen2.5-VL-3B-Instruct
Dataset	shantipriya/odia-ocr-merged (~73 K images)
Method	LoRA (r=128, α=256, 7 modules)
Epochs	3
Batch size	16 (effective, bf16)
GPU	NVIDIA H100 80 GB
Framework	Transformers + PEFT + TRL

Quick inference

from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration
from peft import PeftModel
import torch
from PIL import Image

base = "Qwen/Qwen2.5-VL-3B-Instruct"
adapter = "shantipriya/odia-ocr-qwen-finetuned"

processor = AutoProcessor.from_pretrained(base)
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    base, torch_dtype=torch.bfloat16, device_map="auto"
)
model = PeftModel.from_pretrained(model, adapter)
model.eval()

image = Image.open("odia_text.png").convert("RGB")
messages = [{"role": "user", "content": [
    {"type": "image", "image": image},
    {"type": "text",  "text": "Extract all Odia text from this image. Return only the text."},
]}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)

with torch.no_grad():
    out = model.generate(**inputs, max_new_tokens=512, temperature=0.1, do_sample=False)
print(processor.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Try it

👉 Live demo on HF Spaces

License

Apache 2.0

Downloads last month: 3

Safetensors

Model size

4B params

Tensor type

F32

Model tree for shantipriya/odia-ocr-qwen-finetuned

Base model

Qwen/Qwen2.5-VL-3B-Instruct

Adapter

(123)

this model

shantipriya
/

odia-ocr-qwen-finetuned

Odia OCR – Qwen2.5-VL Fine-tuned

Training details

Quick inference

Try it

License

Model tree for shantipriya/odia-ocr-qwen-finetuned

Dataset used to train shantipriya/odia-ocr-qwen-finetuned