🇺🇿 Uzbek Handwriting OCR — TrOCR Fine-tuned Model

This model recognizes handwritten Uzbek and Russian text from images. It is fine-tuned from microsoft/trocr-base-handwritten on a custom dataset of 10,368 handwritten line images.

✨ Key Features

Languages: Uzbek (Latin & Cyrillic) and Russian
Input: Single line of handwritten text image
Output: Recognized text string
CER: 0.3395 on evaluation set

🚀 Quick Start

Single Line OCR

from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
import torch

# Load model
processor = TrOCRProcessor.from_pretrained("microsoft/trocr-base-handwritten")
model = VisionEncoderDecoderModel.from_pretrained("abduazizovanozima7/uzbek-trocr-line-v1")

device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

# Read image
image = Image.open("line_image.png").convert("RGB")

# OCR
pixel_values = processor(image, return_tensors="pt").pixel_values.to(device)
generated_ids = model.generate(pixel_values, max_new_tokens=128, num_beams=4)
text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

print(text)

Full Page OCR Pipeline

For full page images, use the inference pipeline that:

Splits the page into individual lines
Batch processes each line through the model
Concatenates results into final text

import cv2
import numpy as np
import torch
from PIL import Image
from transformers import TrOCRProcessor, VisionEncoderDecoderModel

# Load model
processor = TrOCRProcessor.from_pretrained("microsoft/trocr-base-handwritten")
model = VisionEncoderDecoderModel.from_pretrained("abduazizovanozima7/uzbek-trocr-line-v1")
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
model.eval()

def segment_lines(image_path):
    """Split a full page image into individual text lines."""
    img = cv2.imread(image_path)
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

    # Binarize
    binary = cv2.adaptiveThreshold(
        gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
        cv2.THRESH_BINARY_INV, 15, 10
    )

    # Horizontal projection profile
    h_proj = np.sum(binary, axis=1) / 255
    h, w = img.shape[:2]
    threshold = w * 0.02

    # Find text line regions
    is_text = h_proj > threshold
    lines = []
    in_text = False
    start = 0
    for i in range(len(is_text)):
        if is_text[i] and not in_text:
            start = i
            in_text = True
        elif not is_text[i] and in_text:
            if i - start >= 15:
                lines.append((start, i))
            in_text = False
    if in_text and len(is_text) - start >= 15:
        lines.append((start, len(is_text)))

    # Crop each line with padding
    cropped = []
    for s, e in lines:
        y1 = max(0, s - 10)
        y2 = min(h, e + 10)
        cropped.append(img[y1:y2, :])

    return cropped

def ocr_batch(line_images, batch_size=8):
    """Run OCR on multiple line images in batches."""
    results = []
    for i in range(0, len(line_images), batch_size):
        batch = line_images[i:i+batch_size]
        pil_imgs = [Image.fromarray(cv2.cvtColor(img, cv2.COLOR_BGR2RGB)) for img in batch]
        pixel_values = processor(pil_imgs, return_tensors="pt").pixel_values.to(device)
        with torch.no_grad():
            ids = model.generate(pixel_values, max_new_tokens=128, num_beams=4)
        texts = processor.batch_decode(ids, skip_special_tokens=True)
        results.extend([t.strip() for t in texts])
    return results

def ocr_full_page(image_path):
    """Full pipeline: image → lines → OCR → text."""
    lines = segment_lines(image_path)
    if not lines:
        return ""
    texts = ocr_batch(lines)
    return "\n".join(texts)

# Usage
result = ocr_full_page("handwritten_page.jpg")
print(result)

📊 Training Details

Parameter	Value
Base model	`microsoft/trocr-base-handwritten`
Dataset	10,368 handwritten line images
Languages	Uzbek (Latin/Cyrillic), Russian
Epochs	5
Batch size	16
Learning rate	5e-5
GPU	NVIDIA P100 (Kaggle)
Final CER	0.3395
Final Loss	1.6348

📁 Dataset

Training data: abduazizovanozima7/uzbek-line-handwriting-dataset

The dataset contains handwritten text images with OCR labels generated using GPT-4o, covering Uzbek Latin, Uzbek Cyrillic, and Russian scripts.

⚠️ Limitations

Best results on single line images (not full paragraphs in one image)
For full pages, use the segmentation pipeline above
May struggle with very messy or overlapping handwriting
Optimized for notebook-style handwriting on lined paper

📝 Citation

@misc{abduazizova2026uzbektrocr,
  title={Uzbek Handwriting OCR with TrOCR},
  author={Nozima Abduazizova},
  year={2026},
  publisher={Hugging Face},
  url={https://huggingface.co/abduazizovanozima7/uzbek-trocr-line-v1}
}

Downloads last month: 10

Safetensors

Model size

0.3B params

Tensor type

F32

Model tree for abduazizovanozima7/uzbek-trocr-line-v1

Base model

microsoft/trocr-base-handwritten

Finetuned

(33)

this model

Evaluation results

CER on Uzbek Line Handwriting Dataset
self-reported

0.340