DeepSeek-OCR-2 Urdu OCR 1M LoRA

LoRA adapter fine-tuned from deepseek-ai/DeepSeek-OCR-2 for Urdu OCR on a small subset of PuristanLabs1/urdu-ocr-1M.

Summary

  • Base model: deepseek-ai/DeepSeek-OCR-2
  • Task: Urdu OCR
  • Dataset config: nastaliq
  • Train samples: 800
  • Validation samples: 80
  • Metric: CER

This is a small adapter-focused experiment meant to improve Urdu transcription quality without uploading a full model checkpoint.

Usage

This repo is an adapter repo. You can load it directly and PEFT will attach the base model automatically from the adapter config.

import torch
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer

MODEL_ID = "kingabzpro/deepseek-ocr-2-urdu-ocr-1m-lora"

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, trust_remote_code=True)
model = AutoPeftModelForCausalLM.from_pretrained(
    MODEL_ID,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    use_safetensors=True,
    _attn_implementation="flash_attention_2",
)
model = model.eval().cuda()
model.config.use_cache = True

prompt = "<image>\nFree OCR. "
result = model.infer(
    tokenizer,
    prompt=prompt,
    image_file="sample.png",
    output_path="ocr-output",
    base_size=1024,
    image_size=768,
    crop_mode=True,
    save_results=False,
    eval_mode=True,
)

print(result)

Training

  • Precision: bf16
  • Epochs: 1
  • Train batch size: 1
  • Eval batch size: 1
  • Gradient accumulation: 8
  • Learning rate: 1e-4
  • Warmup steps: 10
  • Weight decay: 0.01
  • Scheduler: cosine

LoRA target modules:

  • q_proj
  • kv_a_proj_with_mqa
  • kv_b_proj
  • o_proj
  • gate_proj
  • up_proj
  • down_proj

Results

Two example comparisons from the validation subset:

Sample Base CER Finetuned CER
21 0.6290 0.0806
53 1.5385 0.3846

Detailed examples:

sample_index: 21
reference : آنے والے شخص نے اپنا تعارف کرواتے ہوئے کہا:”میرا نام شہزاد ہے۔
before    : ۱- وله شخص از پشت رفت و آمد و به "میرادام" بشارد.
after     : آئے والے شخص نے اپنا تعارف کرواتے ہوئے کہا: ”میں ہام شہزاد ہے۔

sample_index: 53
reference : آپﷺ نے فرمایا کہ اے انجشہ!
before    : 1 - في كتابة النص، هل تسبيماً ماكسراً؟ اكتب ثابتاً!
after     : آپ ﷺ نے فسر ملیا کرا اے اچھٹ !

These examples improved clearly, but this is still a small-sample run and should not be treated as a full benchmark.

Limitations

  • trained on only 800/80 samples
  • evaluated on a very small subset
  • may not generalize well to real scanned Urdu documents without further validation

Citation

Please cite the base model and dataset.

@article{wei2026deepseek,
  title={DeepSeek-OCR 2: Visual Causal Flow},
  author={Wei, Haoran and Sun, Yaofeng and Li, Yukun},
  journal={arXiv preprint arXiv:2601.20552},
  year={2026}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kingabzpro/deepseek-ocr-2-urdu-ocr-1m-lora

Finetuned
(27)
this model

Dataset used to train kingabzpro/deepseek-ocr-2-urdu-ocr-1m-lora

Paper for kingabzpro/deepseek-ocr-2-urdu-ocr-1m-lora