πŸ‡ΊπŸ‡Ώ Uzbek Handwriting OCR β€” TrOCR Fine-tuned Model

This model recognizes handwritten Uzbek and Russian text from images. It is fine-tuned from microsoft/trocr-base-handwritten on a custom dataset of 10,368 handwritten line images.

✨ Key Features

  • Languages: Uzbek (Latin & Cyrillic) and Russian
  • Input: Single line of handwritten text image
  • Output: Recognized text string
  • CER: 0.3395 on evaluation set

πŸš€ Quick Start

Single Line OCR

from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
import torch

# Load model
processor = TrOCRProcessor.from_pretrained("microsoft/trocr-base-handwritten")
model = VisionEncoderDecoderModel.from_pretrained("abduazizovanozima7/uzbek-trocr-line-v1")

device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

# Read image
image = Image.open("line_image.png").convert("RGB")

# OCR
pixel_values = processor(image, return_tensors="pt").pixel_values.to(device)
generated_ids = model.generate(pixel_values, max_new_tokens=128, num_beams=4)
text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

print(text)

Full Page OCR Pipeline

For full page images, use the inference pipeline that:

  1. Splits the page into individual lines
  2. Batch processes each line through the model
  3. Concatenates results into final text
import cv2
import numpy as np
import torch
from PIL import Image
from transformers import TrOCRProcessor, VisionEncoderDecoderModel

# Load model
processor = TrOCRProcessor.from_pretrained("microsoft/trocr-base-handwritten")
model = VisionEncoderDecoderModel.from_pretrained("abduazizovanozima7/uzbek-trocr-line-v1")
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
model.eval()

def segment_lines(image_path):
    """Split a full page image into individual text lines."""
    img = cv2.imread(image_path)
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

    # Binarize
    binary = cv2.adaptiveThreshold(
        gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
        cv2.THRESH_BINARY_INV, 15, 10
    )

    # Horizontal projection profile
    h_proj = np.sum(binary, axis=1) / 255
    h, w = img.shape[:2]
    threshold = w * 0.02

    # Find text line regions
    is_text = h_proj > threshold
    lines = []
    in_text = False
    start = 0
    for i in range(len(is_text)):
        if is_text[i] and not in_text:
            start = i
            in_text = True
        elif not is_text[i] and in_text:
            if i - start >= 15:
                lines.append((start, i))
            in_text = False
    if in_text and len(is_text) - start >= 15:
        lines.append((start, len(is_text)))

    # Crop each line with padding
    cropped = []
    for s, e in lines:
        y1 = max(0, s - 10)
        y2 = min(h, e + 10)
        cropped.append(img[y1:y2, :])

    return cropped

def ocr_batch(line_images, batch_size=8):
    """Run OCR on multiple line images in batches."""
    results = []
    for i in range(0, len(line_images), batch_size):
        batch = line_images[i:i+batch_size]
        pil_imgs = [Image.fromarray(cv2.cvtColor(img, cv2.COLOR_BGR2RGB)) for img in batch]
        pixel_values = processor(pil_imgs, return_tensors="pt").pixel_values.to(device)
        with torch.no_grad():
            ids = model.generate(pixel_values, max_new_tokens=128, num_beams=4)
        texts = processor.batch_decode(ids, skip_special_tokens=True)
        results.extend([t.strip() for t in texts])
    return results

def ocr_full_page(image_path):
    """Full pipeline: image β†’ lines β†’ OCR β†’ text."""
    lines = segment_lines(image_path)
    if not lines:
        return ""
    texts = ocr_batch(lines)
    return "\n".join(texts)

# Usage
result = ocr_full_page("handwritten_page.jpg")
print(result)

πŸ“Š Training Details

Parameter Value
Base model microsoft/trocr-base-handwritten
Dataset 10,368 handwritten line images
Languages Uzbek (Latin/Cyrillic), Russian
Epochs 5
Batch size 16
Learning rate 5e-5
GPU NVIDIA P100 (Kaggle)
Final CER 0.3395
Final Loss 1.6348

πŸ“ Dataset

Training data: abduazizovanozima7/uzbek-line-handwriting-dataset

The dataset contains handwritten text images with OCR labels generated using GPT-4o, covering Uzbek Latin, Uzbek Cyrillic, and Russian scripts.

⚠️ Limitations

  • Best results on single line images (not full paragraphs in one image)
  • For full pages, use the segmentation pipeline above
  • May struggle with very messy or overlapping handwriting
  • Optimized for notebook-style handwriting on lined paper

πŸ“ Citation

@misc{abduazizova2026uzbektrocr,
  title={Uzbek Handwriting OCR with TrOCR},
  author={Nozima Abduazizova},
  year={2026},
  publisher={Hugging Face},
  url={https://huggingface.co/abduazizovanozima7/uzbek-trocr-line-v1}
}
Downloads last month
10
Safetensors
Model size
0.3B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for abduazizovanozima7/uzbek-trocr-line-v1

Finetuned
(33)
this model

Evaluation results