LightOnOCR-2-1B for German (Line-Level)

This model is a fine-tuned version of lightonai/LightOnOCR-2-1B-base specifically trained for line-level OCR.

German shorthand manuscript line-level OCR

Model Description

Base Model: lightonai/LightOnOCR-2-1B-base
Training Data: medieval-data/german-shorthand-line
Task: Line-level text transcription from document images
Language: German (de)
Architecture: Vision-Language Model (1B parameters)

This is a line-level model - it expects cropped line images as input, not full pages. Each image should contain a single line of text.

Evaluation Results

Evaluated on 50 samples from the test set:

Metric	Base Model	Finetuned	Improvement
CER (%)	381.26	21.89	+359.37
WER (%)	494.99	37.41	+457.58
Perfect Matches	0	0	+0

Lower CER/WER is better. Higher perfect matches is better.

Example Outputs

#	Ground Truth	Base Model	Finetuned
1	(Haupt der seligen Irmeng. gefunden. Im ...	12/12/1998 10:00 AM 10:00 AM 10:00 AM 10...	(Haupt der seitdem Jänner 12 20 bei Daue...
2	Schw. Reinh.: Ist vom Lagerdienst freige...	Schw. Reinh. : 2d 9.20 16 09 J. 6	Schw. Reinh.: Ist vom Lagerdienst frei g...
3	Klage daß im Naz.heim den Kranken die Ko...	$$
\begin{aligned}
& \text { 22 e 2 haz....	Klage daß im Naz.heim den Kranken die Ko...
4	Irene: Stimmung sehr verschieden. Kommen...		Irene: Stimmung sehr verschiedenes. Münd...
5	Zwei Schwestern Calabrien: M. Cristina u...	226 Kolabrie: M. Cisneros, Urode	Zwei Schwestern Katalrien: M. Cristina u...

✓ = exact match

Usage

Installation

# Requires transformers from source
pip install git+https://github.com/huggingface/transformers
pip install pillow torch

Python Usage

import torch
from transformers import LightOnOcrForConditionalGeneration, LightOnOcrProcessor
from PIL import Image

# Load model and processor
model_id = "wjbmattingly/LightOnOCR-2-1B-german-shorthand-line"
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.bfloat16 if device == "cuda" else torch.float32

processor = LightOnOcrProcessor.from_pretrained(model_id)
model = LightOnOcrForConditionalGeneration.from_pretrained(
    model_id,
    torch_dtype=dtype,
).to(device)

# Load your line image
image = Image.open("your_image.jpg").convert("RGB")

# Prepare input
messages = [{"role": "user", "content": [{"type": "image"}]}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

inputs = processor(
    text=[text],
    images=[[image]],
    return_tensors="pt",
    padding=True,
    size={"longest_edge": 700},
).to(device)
inputs["pixel_values"] = inputs["pixel_values"].to(dtype)

# Generate transcription
with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=256, do_sample=False)

# Decode output
input_length = inputs["input_ids"].shape[1]
generated_ids = outputs[0, input_length:]
transcription = processor.decode(generated_ids, skip_special_tokens=True)

print(transcription)

Batch Inference

from datasets import load_dataset

# Load dataset
dataset = load_dataset("medieval-data/german-shorthand-line", split="train[:10]")

# Process batch
images = [[img.convert("RGB")] for img in dataset["image"]]
messages = [{"role": "user", "content": [{"type": "image"}]}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
texts = [text] * len(images)

inputs = processor(
    text=texts,
    images=images,
    return_tensors="pt",
    padding=True,
    size={"longest_edge": 700},
).to(device)
inputs["pixel_values"] = inputs["pixel_values"].to(dtype)

outputs = model.generate(**inputs, max_new_tokens=256, do_sample=False)
predictions = processor.batch_decode(outputs[:, inputs["input_ids"].shape[1]:], skip_special_tokens=True)

for pred, gt in zip(predictions, dataset["text"]):
    print(f"Prediction: {pred}")
    print(f"Ground Truth: {gt}")
    print()

Training Details

Base Model: lightonai/LightOnOCR-2-1B-base
Training Method: Fine-tuning with frozen language model backbone
Optimizer: AdamW (fused)
Learning Rate: 6e-5 with linear decay
Precision: bfloat16

Limitations

This model is trained on line-level images. For full-page transcription, you need to first segment the page into individual lines.
Performance may vary on document styles not represented in the training data.

Citation

If you use this model, please cite:

@misc{lightonocr2_finetuned_2026,
  title = {LightOnOCR Fine-tuned for German},
  author = {William Mattingly},
  year = {2026},
  howpublished = {\url{https://huggingface.co/wjbmattingly/LightOnOCR-2-1B-german-shorthand-line}}
}

And the original LightOnOCR paper:

@misc{lightonocr2_2026,
  title = {LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR},
  author = {Said Taghadouini and Adrien Cavaill\`{e}s and Baptiste Aubertin},
  year = {2026},
  howpublished = {\url{https://arxiv.org/pdf/2601.14251}}
}

Acknowledgments

LightOn AI for the excellent LightOnOCR base model
The creators of the medieval-data/german-shorthand-line dataset

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for wjbmattingly/LightOnOCR-2-1B-german-shorthand-line

Base model

lightonai/LightOnOCR-2-1B-base

Finetuned

(14)

this model

Paper for wjbmattingly/LightOnOCR-2-1B-german-shorthand-line

LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR

Paper • 2601.14251 • Published Jan 20 • 26