Urdu Poetry TrOCR (Iqbal Edition)

This model is a fine-tuned version of TrOCR (Transformer-based Optical Character Recognition) specifically optimized for Urdu Nastaliq script. It was trained on a specialized dataset of poetry by Allama Iqbal to master the complex ligatures, overlapping characters, and right-to-left (RTL) flow of classical Urdu calligraphy.

🚀 Model Evolution & Performance

This final version (V2) represents a major breakthrough in handling Urdu cursive script. By optimizing the vision-to-text alignment, we have successfully resolved common OCR issues such as:

Reading Direction: Correctly processes RTL text flow.
Word Continuity: Eliminates "split words" and random character insertions.
Poetic Coherence: Transcribes full couplets with high linguistic accuracy.

📊 Visual Performance Gallery (Sample Results)

Original Image	Model Transcription (Urdu)
	اے نالہ! اے فصل کشور نہ اندوستاں چو مقا ہے تیری پیغامی کو جھک کر آسماں
	تجھ میں کچھ پیدا نہیں دیرینہ روز ٹکے نظاں تو جواں ہے گردش شام و سحر کے درمیاں
	ایک جلوہ تھا کلیم طور سینا کے لیے تو حج یہ ہے سراپا چشم پیما کے لیے
	امتحان دیدئہ ظاہر میں کو ہستاں بے تو پا سہاں اپنا ہے تو دیوار ہند ستاں بے تو
	مطلع ہوال فلک جس کا ہو وہ یواں ہے تو سوئے خلوت گاہ دل دامن کش انساں ہے تو
	چو نہال تیری ثریا سے ہیں سر گرم سخن تو نرمیں پرور پہنائے فلک تیرا وطن

🛠️ Usage & Implementation

To achieve the high-fidelity results shown above, we recommend using the following inference configuration.

Python Example

from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
import torch

# Load the fine-tuned model
processor = TrOCRProcessor.from_pretrained("Khurram123/urdu-poetry-trocr-iqbal")
model = VisionEncoderDecoderModel.from_pretrained("Khurram123/urdu-poetry-trocr-iqbal")

# Load image and prepare pixels
image = Image.open("sample_poetry.jpg").convert("RGB")
pixel_values = processor(image, return_tensors="pt").pixel_values

# Optimized generation parameters for Urdu Nastaliq
generated_ids = model.generate(
    pixel_values,
    max_length=128,
    num_beams=7,             # Higher beams for complex ligature search
    repetition_penalty=3.0,  # Prevents character looping
    length_penalty=1.5,      # Encourages completion of full poetic lines
    early_stopping=False
)

# Decode output
transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(f"OCR Result: {transcription}")

Downloads last month: 106

Safetensors

Model size

0.3B params

Tensor type

F32