DeepSeek-OCR-2 Urdu OCR 1M LoRA

LoRA adapter fine-tuned from deepseek-ai/DeepSeek-OCR-2 for Urdu OCR on a small subset of PuristanLabs1/urdu-ocr-1M.

Summary

Base model: deepseek-ai/DeepSeek-OCR-2
Task: Urdu OCR
Dataset config: nastaliq
Train samples: 800
Validation samples: 80
Metric: CER

This is a small adapter-focused experiment meant to improve Urdu transcription quality without uploading a full model checkpoint.

Usage

This repo is an adapter repo. You can load it directly and PEFT will attach the base model automatically from the adapter config.

import torch
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer

MODEL_ID = "kingabzpro/deepseek-ocr-2-urdu-ocr-1m-lora"

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, trust_remote_code=True)
model = AutoPeftModelForCausalLM.from_pretrained(
    MODEL_ID,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    use_safetensors=True,
    _attn_implementation="flash_attention_2",
)
model = model.eval().cuda()
model.config.use_cache = True

prompt = "<image>\nFree OCR. "
result = model.infer(
    tokenizer,
    prompt=prompt,
    image_file="sample.png",
    output_path="ocr-output",
    base_size=1024,
    image_size=768,
    crop_mode=True,
    save_results=False,
    eval_mode=True,
)

print(result)

Training

Precision: bf16
Epochs: 1
Train batch size: 1
Eval batch size: 1
Gradient accumulation: 8
Learning rate: 1e-4
Warmup steps: 10
Weight decay: 0.01
Scheduler: cosine

LoRA target modules:

q_proj
kv_a_proj_with_mqa
kv_b_proj
o_proj
gate_proj
up_proj
down_proj

Results

Two example comparisons from the validation subset:

Sample	Base CER	Finetuned CER
21	0.6290	0.0806
53	1.5385	0.3846

Detailed examples:

sample_index: 21
reference : آنے والے شخص نے اپنا تعارف کرواتے ہوئے کہا:”میرا نام شہزاد ہے۔
before    : ۱- وله شخص از پشت رفت و آمد و به "میرادام" بشارد.
after     : آئے والے شخص نے اپنا تعارف کرواتے ہوئے کہا: ”میں ہام شہزاد ہے۔

sample_index: 53
reference : آپﷺ نے فرمایا کہ اے انجشہ!
before    : 1 - في كتابة النص، هل تسبيماً ماكسراً؟ اكتب ثابتاً!
after     : آپ ﷺ نے فسر ملیا کرا اے اچھٹ !

These examples improved clearly, but this is still a small-sample run and should not be treated as a full benchmark.

Limitations

trained on only 800/80 samples
evaluated on a very small subset
may not generalize well to real scanned Urdu documents without further validation

Citation

Please cite the base model and dataset.

@article{wei2026deepseek,
  title={DeepSeek-OCR 2: Visual Causal Flow},
  author={Wei, Haoran and Sun, Yaofeng and Li, Yukun},
  journal={arXiv preprint arXiv:2601.20552},
  year={2026}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for kingabzpro/deepseek-ocr-2-urdu-ocr-1m-lora

Base model

deepseek-ai/DeepSeek-OCR-2

Finetuned

(27)

this model

Dataset used to train kingabzpro/deepseek-ocr-2-urdu-ocr-1m-lora

Paper for kingabzpro/deepseek-ocr-2-urdu-ocr-1m-lora

DeepSeek-OCR 2: Visual Causal Flow

Paper • 2601.20552 • Published Jan 28 • 68