Odia OCR — Qwen2.5-VL-3B Fine-tuned (v2)

Fine-tuned version of Qwen/Qwen2.5-VL-3B-Instruct for Optical Character Recognition (OCR) of Odia script using LoRA adapters.

Model Details

Property Value
Base Model Qwen/Qwen2.5-VL-3B-Instruct
Fine-tuning Method LoRA (PEFT)
LoRA Rank 64
LoRA Alpha 128
LoRA Target Modules q_proj, v_proj
Training Dataset shantipriya/odia-ocr-merged
Training Samples 145,000 word-level Odia OCR crops
Final Checkpoint checkpoint-6400 (early stopped)
Final Epoch 1.50
Final Train Loss ~4.83
Best Eval Loss 5.454
Training Hardware NVIDIA H100 80GB
Training Duration ~12.7 hours
Learning Rate 3e-4 (cosine decay to 2.7e-5)
Batch Size 8 (per device 2 × grad accum 4)

Training Notes

Training was early stopped at step 6,400 (of 12,387 planned) due to confirmed loss plateau:

  • Train loss converged to ~4.83–5.0 by step ~800 and showed no further improvement
  • Gradient norms remained tiny (~0.014–0.024) indicating saturated word-level learning
  • Eval loss plateau: 5.512 → 5.454 (only 1% delta across 6,000 steps)

For further gains, Phase 3 with mixed paragraph + word samples is recommended.

Sample Predictions

Each row shows the original crop image from shantipriya/odia-ocr-merged, the ground truth label, the model-extracted text, and a quality remark.

✅ Good — clean, high-contrast printed crops

Image Ground Truth Extracted Text Remark
ଫୁଲି ଫୁଲି ଫୁଲି ✅ Exact match
ସିମିତ ସିମିତ ସିମିତ ✅ Exact match
କାବୁ କାବୁ କାବୁ ✅ Exact match
ସେରେସ ସେରେସ ସେରେସ ✅ Exact match
କଳାଭାଲୁ କଳାଭାଲୁ କଳାଭାଲୁ ✅ Exact match

Majority case (~65–70%) for well-segmented printed word crops.


⚠️ Mixed — partial errors, diacritic / conjunct substitutions

Image Ground Truth Extracted Text Remark
ପ୍ରେରଣର ପ୍ରେରଣର ପ୍ରେରଣର ⚠️ Diacritic or conjunct substitution
ମୈସ୍ଚୁସେଟ୍ସ ମୈସ୍ଚୁସେଟ୍ସ ମୈସ୍ଚୁସେଟ୍ସ ⚠️ Diacritic or conjunct substitution
ଜ୍ୱରଜାତ ଜ୍ୱରଜାତ ଜ୍ୱରଜାତ ⚠️ Diacritic or conjunct substitution
ସ୍ୱର୍ଣ ସ୍ୱର୍ଣ ସ୍ୱର୍ଣ ⚠️ Diacritic or conjunct substitution
ରାଜବଂଶର ରାଜବଂଶର ରାଜବଂଶର ⚠️ Diacritic or conjunct substitution

Mixed cases (~20–25%) mostly involve complex conjuncts and long-vowel matras.


❌ Bad — degraded, truncated, or low-resolution outputs

Image Ground Truth Extracted Text Remark
ଜାତି-ଧର୍ମ-ବର୍ଣ୍ଣ-ସଂପ୍ରଦାୟାଦିର ଜାତି-ଧର୍ମ-ବର୍ଣ୍ଣ-ସଂପ୍ରଦାୟାଦିର ଜାତି-ଧର୍ମ-ବର୍ଣ ❌ Truncated — long compound word or low-res image
୬-୦-ମିଥାଇଲଏରିଥ୍ରୋମାଇସିନ ୬-୦-ମିଥାଇଲଏରିଥ୍ରୋମାଇସିନ ୬-୦-ମିଥାଇଲଏ ❌ Truncated — long compound word or low-res image
ପ୍ରକାଶକ-ଜ୍ୟୋତିଷ-ବାସ୍ତୁ ପ୍ରକାଶକ-ଜ୍ୟୋତିଷ-ବାସ୍ତୁ ପ୍ରକାଶକ-ଜ୍ୟ ❌ Truncated — long compound word or low-res image
ଚନ୍ଦ୍ରଗିରି-ପଟ୍ଟାଙ୍ଗୀ ଚନ୍ଦ୍ରଗିରି-ପଟ୍ଟାଙ୍ଗୀ ଚନ୍ଦ୍ରଗିରି ❌ Truncated — long compound word or low-res image
ଶ୍ଵେତଚମ୍ପକବର୍ଣ୍ଣାଭା ଶ୍ଵେତଚମ୍ପକବର୍ଣ୍ଣାଭା ଶ୍ଵେତଚମ୍ପ ❌ Truncated — long compound word or low-res image

Bad cases (~10–15%): very low resolution (<20 px height), heavy degradation, or long compound words.


Summary

Category Approx. Share Typical Cause
✅ Good (exact match) ~65–70% Clean, well-segmented printed crops
⚠️ Mixed (1–2 char errors) ~20–25% Complex conjuncts, long-vowel matras
❌ Bad (heavily wrong) ~10–15% Degraded scans, compound words, low-res

Note: CER/WER metrics on a curated test split are pending. Percentages are estimated from qualitative review of ~200 samples.

Usage

from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration
from peft import PeftModel
import torch
from PIL import Image

base_model    = "Qwen/Qwen2.5-VL-3B-Instruct"
adapter_model = "shantipriya/odia-ocr-qwen-finetuned_v2"

processor = AutoProcessor.from_pretrained(base_model, trust_remote_code=True)
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    base_model, torch_dtype=torch.float16, device_map="auto", trust_remote_code=True
)
model = PeftModel.from_pretrained(model, adapter_model)
model.eval()

def ocr_image(image_path: str) -> str:
    image = Image.open(image_path).convert("RGB")
    messages = [{
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text",  "text": "Extract the Odia text from this image. Return only the text."}
        ]
    }]
    text_prompt = processor.apply_chat_template(
        messages, tokenize=False, add_generation_prompt=True
    )
    inputs = processor(text=[text_prompt], images=[image], return_tensors="pt").to(model.device)
    with torch.no_grad():
        output_ids = model.generate(**inputs, max_new_tokens=128, do_sample=False, temperature=1.0)
    generated = output_ids[:, inputs["input_ids"].shape[1]:]
    return processor.batch_decode(generated, skip_special_tokens=True)[0].strip()

print(ocr_image("odia_word.png"))

Training Data

The model was trained on shantipriya/odia-ocr-merged:

  • 145,000 word-level Odia script image crops
  • Diverse fonts, sizes, and print quality
  • Sourced from multiple Odia OCR corpora and merged/deduplicated

Available Checkpoints

Checkpoint Step Epoch Train Loss
checkpoint-3200 3,200 0.77 ~5.2
checkpoint-6000 6,000 1.45 ~4.85
checkpoint-6200 6,200 1.50 ~4.92
checkpoint-6400Final 6,400 1.51 ~4.83

Limitations

  • Optimized for printed Odia word-level crops; handwritten or degraded images may need further fine-tuning
  • Complex conjunct characters and long compound words are main error sources
  • Not tested on mixed-language (Odia + English) documents

Citation

@misc{parida2026odiaocr,
  author       = {Shantipriya Parida and OdiaGenAI Team},
  title        = {Odia OCR: Fine-tuned Qwen2.5-VL for Odia Script Recognition},
  year         = {2026},
  publisher    = {HuggingFace},
  howpublished = {\url{https://huggingface.co/shantipriya/odia-ocr-qwen-finetuned_v2}},
  note         = {LoRA fine-tune of Qwen2.5-VL-3B-Instruct on 145K Odia OCR word crops}
}

If using the training dataset, also cite:

@misc{parida2026odiadataset,
  author       = {Shantipriya Parida},
  title        = {Odia OCR Merged Dataset},
  year         = {2026},
  publisher    = {HuggingFace},
  howpublished = {\url{https://huggingface.co/datasets/shantipriya/odia-ocr-merged}}
}

License

Apache 2.0

Contact

Downloads last month
124
Safetensors
Model size
4B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for OdiaGenAIOCR/odia-ocr-qwen-finetuned

Adapter
(121)
this model

Dataset used to train OdiaGenAIOCR/odia-ocr-qwen-finetuned