Urdu Poetry TrOCR (Iqbal Edition)

This model is a fine-tuned version of TrOCR (Transformer-based Optical Character Recognition) specifically optimized for Urdu Nastaliq script. It was trained on a specialized dataset of poetry by Allama Iqbal to master the complex ligatures, overlapping characters, and right-to-left (RTL) flow of classical Urdu calligraphy.

🚀 Model Evolution & Performance

This final version (V2) represents a major breakthrough in handling Urdu cursive script. By optimizing the vision-to-text alignment, we have successfully resolved common OCR issues such as:

  • Reading Direction: Correctly processes RTL text flow.
  • Word Continuity: Eliminates "split words" and random character insertions.
  • Poetic Coherence: Transcribes full couplets with high linguistic accuracy.

📊 Visual Performance Gallery (Sample Results)

Original Image Model Transcription (Urdu)
sher_1 اے نالہ! اے فصل کشور نہ اندوستاں چو مقا ہے تیری پیغامی کو جھک کر آسماں
sher_2 تجھ میں کچھ پیدا نہیں دیرینہ روز ٹکے نظاں تو جواں ہے گردش شام و سحر کے درمیاں
sher_3 ایک جلوہ تھا کلیم طور سینا کے لیے تو حج یہ ہے سراپا چشم پیما کے لیے
sher_4 امتحان دیدئہ ظاہر میں کو ہستاں بے تو پا سہاں اپنا ہے تو دیوار ہند ستاں بے تو
sher_5 مطلع ہوال فلک جس کا ہو وہ یواں ہے تو سوئے خلوت گاہ دل دامن کش انساں ہے تو
sher_8 چو نہال تیری ثریا سے ہیں سر گرم سخن تو نرمیں پرور پہنائے فلک تیرا وطن

🛠️ Usage & Implementation

To achieve the high-fidelity results shown above, we recommend using the following inference configuration.

Python Example

from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
import torch

# Load the fine-tuned model
processor = TrOCRProcessor.from_pretrained("Khurram123/urdu-poetry-trocr-iqbal")
model = VisionEncoderDecoderModel.from_pretrained("Khurram123/urdu-poetry-trocr-iqbal")

# Load image and prepare pixels
image = Image.open("sample_poetry.jpg").convert("RGB")
pixel_values = processor(image, return_tensors="pt").pixel_values

# Optimized generation parameters for Urdu Nastaliq
generated_ids = model.generate(
    pixel_values,
    max_length=128,
    num_beams=7,             # Higher beams for complex ligature search
    repetition_penalty=3.0,  # Prevents character looping
    length_penalty=1.5,      # Encourages completion of full poetic lines
    early_stopping=False
)

# Decode output
transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(f"OCR Result: {transcription}")
Downloads last month
106
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support