LightOnOCR-2-1B for Latin (Line-Level)

This model is a fine-tuned version of lightonai/LightOnOCR-2-1B-base specifically trained for line-level OCR.

CATMuS Caroline minuscule OCR model trained on line-level images from early medieval Latin manuscripts.

Model Description

Base Model: lightonai/LightOnOCR-2-1B-base
Training Data: wjbmattingly/catmus-edited2
Task: Line-level text transcription from document images
Language: Latin (la)
Architecture: Vision-Language Model (1B parameters)

This is a line-level model - it expects cropped line images as input, not full pages. Each image should contain a single line of text.

Evaluation Results

Evaluated on 50 samples from the test set:

Metric	Base Model	Finetuned	Improvement
CER (%)	74.07	13.71	+60.36
WER (%)	107.20	52.54	+54.66
Perfect Matches	0	9	+9

Lower CER/WER is better. Higher perfect matches is better.

Example Outputs

#	Ground Truth	Base Model	Finetuned
1	geƿlitegod ƿega	seplie308 pesa	geplitegoda ƿega
2	iohannes ƿuldre		iohannes pulore
3	seld sƿa saet ⁊ heold ða ƿaes he		reld spa sæt ⁊ heold ða pæs he
4	beo hand ofer	too hand open	✓ beo hand ofer
5	lif to cƿeðanne sƿa sƿa hesylfa		lifto cƿeðanne sƿa sƿa hesylfa

✓ = exact match

Usage

Installation

# Requires transformers from source
pip install git+https://github.com/huggingface/transformers
pip install pillow torch

Python Usage

import torch
from transformers import LightOnOcrForConditionalGeneration, LightOnOcrProcessor
from PIL import Image

# Load model and processor
model_id = "wjbmattingly/LightOnOCR-2-1B-catmus-caroline"
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.bfloat16 if device == "cuda" else torch.float32

processor = LightOnOcrProcessor.from_pretrained(model_id)
model = LightOnOcrForConditionalGeneration.from_pretrained(
    model_id,
    torch_dtype=dtype,
).to(device)

# Load your line image
image = Image.open("your_image.jpg").convert("RGB")

# Prepare input
messages = [{"role": "user", "content": [{"type": "image"}]}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

inputs = processor(
    text=[text],
    images=[[image]],
    return_tensors="pt",
    padding=True,
    size={"longest_edge": 700},
).to(device)
inputs["pixel_values"] = inputs["pixel_values"].to(dtype)

# Generate transcription
with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=256, do_sample=False)

# Decode output
input_length = inputs["input_ids"].shape[1]
generated_ids = outputs[0, input_length:]
transcription = processor.decode(generated_ids, skip_special_tokens=True)

print(transcription)

Batch Inference

from datasets import load_dataset

# Load dataset
dataset = load_dataset("wjbmattingly/catmus-edited2", split="train[:10]")

# Process batch
images = [[img.convert("RGB")] for img in dataset["image"]]
messages = [{"role": "user", "content": [{"type": "image"}]}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
texts = [text] * len(images)

inputs = processor(
    text=texts,
    images=images,
    return_tensors="pt",
    padding=True,
    size={"longest_edge": 700},
).to(device)
inputs["pixel_values"] = inputs["pixel_values"].to(dtype)

outputs = model.generate(**inputs, max_new_tokens=256, do_sample=False)
predictions = processor.batch_decode(outputs[:, inputs["input_ids"].shape[1]:], skip_special_tokens=True)

for pred, gt in zip(predictions, dataset["text"]):
    print(f"Prediction: {pred}")
    print(f"Ground Truth: {gt}")
    print()

Training Details

Base Model: lightonai/LightOnOCR-2-1B-base
Training Method: Fine-tuning with frozen language model backbone
Optimizer: AdamW (fused)
Learning Rate: 6e-5 with linear decay
Precision: bfloat16

Limitations

This model is trained on line-level images. For full-page transcription, you need to first segment the page into individual lines.
Performance may vary on document styles not represented in the training data.

Citation

If you use this model, please cite:

@misc{lightonocr2_finetuned_2026,
  title = {LightOnOCR Fine-tuned for Latin},
  author = {William Mattingly},
  year = {2026},
  howpublished = {\url{https://huggingface.co/wjbmattingly/LightOnOCR-2-1B-catmus-caroline}}
}

And the original LightOnOCR paper:

@misc{lightonocr2_2026,
  title = {LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR},
  author = {Said Taghadouini and Adrien Cavaill\`{e}s and Baptiste Aubertin},
  year = {2026},
  howpublished = {\url{https://arxiv.org/pdf/2601.14251}}
}

Acknowledgments

LightOn AI for the excellent LightOnOCR base model
The creators of the wjbmattingly/catmus-edited2 dataset

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for wjbmattingly/LightOnOCR-2-1B-catmus-caroline

Base model

lightonai/LightOnOCR-2-1B-base

Finetuned

(14)

this model

Paper for wjbmattingly/LightOnOCR-2-1B-catmus-caroline

LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR

Paper • 2601.14251 • Published Jan 20 • 26