OdiaGenAI OCR — Qwen2.5-VL-3B (Merged Full Model)

A fully stand-alone Odia OCR model, obtained by merging LoRA fine-tuning weights into the base Qwen/Qwen2.5-VL-3B-Instruct checkpoint.
No adapter or PEFT library is required at inference time — just load and run.

Organisation: OdiaGenAI
Task: Optical Character Recognition (OCR) for Odia (ଓଡ଼ିଆ) script
Model size: ~3B parameters (7.5 GB in fp16)

Model Details

Property	Value
Base model	Qwen/Qwen2.5-VL-3B-Instruct
Fine-tuning method	LoRA (r=16, alpha=32) via PEFT
Merge method	`PeftModel.merge_and_unload()`
Training data	shantipriya/odia-ocr-merged
Training samples	73,000 Odia OCR image-text pairs
LoRA adapter (v2)	shantipriya/odia-ocr-qwen-finetuned_v2
Language	Odia (ଓଡ଼ିଆ) — `or`
Precision	float16
GPU required	Recommended (works on ≥16 GB VRAM)

Quick Start

Installation

pip install transformers torch pillow accelerate

Inference

from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration
import torch
from PIL import Image

model_id = "OdiaGenAIOCR/odia-ocr-qwen-finetuned-merged"

# Load processor and model
processor = AutoProcessor.from_pretrained(model_id)
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto",
)
model.eval()

# Run OCR on an image
image = Image.open("odia_text_image.png").convert("RGB")

messages = [{
    "role": "user",
    "content": [
        {"type": "image", "image": image},
        {"type": "text",  "text": "Extract all Odia text from this image. Return only the text."},
    ],
}]

text_input = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
inputs = processor(
    text=[text_input], images=[image], return_tensors="pt"
).to(model.device)

with torch.no_grad():
    output_ids = model.generate(
        **inputs,
        max_new_tokens=256,
        do_sample=False,
    )

# Decode only the generated portion
input_len = inputs["input_ids"].shape[1]
result = processor.decode(output_ids[0][input_len:], skip_special_tokens=True)
print(result)

Batch inference with multiple images

images = [Image.open(p).convert("RGB") for p in image_paths]

all_inputs = []
for img in images:
    messages = [{
        "role": "user",
        "content": [
            {"type": "image", "image": img},
            {"type": "text",  "text": "Extract all Odia text from this image. Return only the text."},
        ],
    }]
    all_inputs.append(
        processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    )

inputs = processor(text=all_inputs, images=images, padding=True, return_tensors="pt").to(model.device)
with torch.no_grad():
    out = model.generate(**inputs, max_new_tokens=256, do_sample=False)

results = processor.batch_decode(out[:, inputs["input_ids"].shape[1]:], skip_special_tokens=True)

Training Details

The model was trained on 73,000 Odia OCR samples from the merged dataset shantipriya/odia-ocr-merged, combining:

Word-level printed Odia text images
Line-level Odia text samples
Sources: historical manuscripts, newspapers, books, and digital documents

LoRA Configuration

LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)

Training Run

Parameter	Value
Max steps	6,400
Learning rate	1e-4 (cosine decay)
Warmup steps	100
Batch size	4 (per device)
Gradient accumulation	4 steps
Optimizer	AdamW
Precision	bf16
Hardware	NVIDIA H100 80GB

Sample Outputs

Input: Odia handwritten/printed text image
Prompt: "Extract all Odia text from this image. Return only the text."

Ground Truth	Model Output
`ସୂଚିପତ୍ର`	`ସୃଗପତ୍ରା`
`ଅବସର ବାସରେ`	`ଅବସର ବାସରେ`
`ଶ୍ରୀ ଫକୀରମୋହନ ସେନାପତି`	`ଶ୍ରୀ ଫକୀରମୋହନ ସେନାପତି`

Note: The model was fine-tuned primarily on word- and line-level images. For full-page OCR, consider splitting the image into horizontal strips (~400px height) before inference.

Related Resources

Resource	Link
LoRA Adapter (v2)	shantipriya/odia-ocr-qwen-finetuned_v2
Merged Model (this)	OdiaGenAIOCR/odia-ocr-qwen-finetuned-merged
v1 Merged Model	OdiaGenAIOCR/odia-ocr-qwen-finetuned
Training Dataset	shantipriya/odia-ocr-merged
Paragraph Dataset	OdiaGenAIOCR/Odia-lipi-ocr-data
Base Model	Qwen/Qwen2.5-VL-3B-Instruct

Citation

If you use this model, please cite:

@misc{odiagen-ocr-2025,
  title        = {OdiaGenAI OCR: Fine-tuned Qwen2.5-VL for Odia Script Recognition},
  author       = {OdiaGenAI},
  year         = {2025},
  howpublished = {\url{https://huggingface.co/OdiaGenAIOCR/odia-ocr-qwen-finetuned-merged}},
  note         = {Merged full model (base + LoRA), trained on 145K Odia OCR samples}
}

License

This model is released under the Apache 2.0 License.
See also the license of the base model: Qwen/Qwen2.5-VL-3B-Instruct.

Downloads last month: 5

Safetensors

Model size

4B params

Tensor type

F16

Model tree for OdiaGenAIOCR/odia-ocr-qwen-finetuned-merged

Base model

Qwen/Qwen2.5-VL-3B-Instruct

Finetuned

(728)

this model

OdiaGenAIOCR
/

odia-ocr-qwen-finetuned-merged