Vsesvit Document Layout Analysis — YOLOv8m

A fine-tuned YOLOv8m model for document layout analysis of Vsesvit ("Universe"), a Ukrainian Soviet-era literary periodical. Detects and classifies eight layout regions on rasterized periodical page images.

Model Description

The model takes a rasterized page image as input and outputs axis-aligned bounding boxes labeled with one of eight layout classes. It was trained on 974 manually annotated pages across 50 issues of Vsesvit, digitized from chtyvo.org.ua.

Classes

ID	Class	Description
0	`journal_name`	Masthead typography
1	`article_title`	Display-weight article heading
2	`author_name`	By-line attribution
3	`page_number`	Numerals in corner or header
4	`text_block`	Body prose columns
5	`image`	Photographs and illustrations
6	`mixed_text`	Inseparable typographic-graphic composites
7	`decorative_element`	Ornamental borders, rules, vignettes

Usage

from ultralytics import YOLO

model = YOLO("best_vsesvit.pt")

results = model.predict(
    source="page_image.jpg",
    imgsz=512,
    conf=0.25
)

for result in results:
    for box in result.boxes:
        cls_id = int(box.cls)
        conf = float(box.conf)
        xyxy = box.xyxy[0].tolist()
        print(f"{result.names[cls_id]}: conf={conf:.2f}, box={xyxy}")

The model was trained at 512×512. Passing higher-resolution images may degrade performance; resize inputs before inference or use imgsz=512 explicitly.

Performance

All metrics are reported on the held-out test set (139 pages, 1,270 instances), stratified by decade of publication and never seen during training or hyperparameter selection.

Aggregate

Precision	Recall	mAP@0.5	mAP@0.5:0.95
0.847	0.761	0.799	0.600

Per-class

Class	AP@0.5	Notes
`journal_name`	0.995	Near-perfect; consistent masthead placement
`image`	0.971	High visual distinctiveness
`text_block`	0.888
`article_title`	0.852
`mixed_text`	0.694
`decorative_element`	0.633	Frequently subsumed into adjacent text_block
`author_name`	0.560	Primary failure mode — see Limitations
`page_number`	—	Omitted; absent from most Vsesvit pages

Limitations

author_name is the primary failure mode (recall 0.355). By-lines and article titles appear in the same display typeface at similar sizes; only vertical position distinguishes them. This is a fundamental spatial disambiguation problem that purely local-feature detectors cannot reliably resolve. Confidence scores on author_name predictions are characteristically low (0.3–0.4).

Planned mitigation: a spatial post-processing pass using predicted bounding-box vertical position and reading-order heuristics to disambiguate article_title from author_name.

decorative_element recall is also below average (0.632); ornamental borders are frequently merged into adjacent text_block detections.

The model was trained exclusively on Vsesvit (1925–1934). Transfer to other periodicals — especially those with different typographic conventions or languages — has not been evaluated.

Training Details

Setting	Value
Base model	COCO-pretrained YOLOv8m (25.8M params)
Input resolution	512 × 512
Epochs	100 (best checkpoint: epoch 96)
Batch size	8
Learning rate	1e-3, cosine decay
Hardware	Apple M1 CPU
Augmentation	±5° rotation, HSV value jitter (0.6), mosaic (p=0.5)
Train/val/test split	70% / 15% / 15% by issue, decade-stratified

Training loss combines CIoU regression loss, binary cross-entropy for classification, and Distribution Focal Loss for box coordinate refinement, with Task-Aligned Assigner target assignment.

Dataset

Annotations cover 974 pages across 50 issues of Vsesvit, produced by the author using a custom browser-based annotation tool (HTML/JavaScript, PDF.js, JSONL and YOLO TXT export). Issues were split by decade to account for substantial shifts in scan quality, typographic conventions, and layout style across the 1925–1934 run.

The annotated dataset is released separately: kgeorgii/vsesvit-dla.

Annotations were produced using a custom browser-based annotation tool, available at kgeorgii.github.io/dla_annotator/ (source).

Citation

If you use this model or dataset, please cite:

@misc{vsesvit-dla-2026,
  title     = {Document Layout Analysis for Vsesvit: A Ukrainian Soviet-Era Literary Periodical},
  author    = {[Georgii Korotkov]},
  year      = {2026},
  note      = {Model and annotated corpus released at https://huggingface.co/kgeorgii/vsesvit-layout-yolov8m}
}

References

Aguilar, S. T. (2025). From codicology to code: A comparative study of transformer and YOLO-based detectors for layout analysis in historical documents. arXiv:2506.20326.
Dutta, A., & Biswas, S. (2019). CNN based extraction of panels/characters from Bengali comic book page images. In Proceedings of the 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), vol. 1, pp. 38–43. IEEE.
Ermolaev, N., Keenan, T., Reischl, K., Janco, A., & Jacobson, A. (n.d.). Pages of Early Soviet Performance (PESP). Princeton Center for Digital Humanities. https://huggingface.co/datasets/apjanco/pesp
dos Santos Júnior, E. S., Paixão, T., & Alvarez, A. B. (2025). Comparative performance of YOLOv8, YOLOv9, YOLOv10, and YOLOv11 for layout analysis of historical document images. Applied Sciences, 15(6), 3164. https://doi.org/10.3390/app15063164
Scotto di Freca, A., D'Alessandro, T., Fontanella, F., Sarria, F., & De Stefano, C. (2026). Character detection using YOLO for writer identification in multiple medieval books. arXiv:2601.04834.

Downloads last month: 20

Papers for kgeorgii/vsesvit-layout-yolov8m

Character Detection using YOLO for Writer Identification in multiple Medieval books

Paper • 2601.04834 • Published Jan 8

From Codicology to Code: A Comparative Study of Transformer and YOLO-based Detectors for Layout Analysis in Historical Documents

Paper • 2506.20326 • Published Jun 25, 2025

Evaluation results

mAP@0.5 on Vsesvit DLA (held-out test set)
self-reported

0.799
mAP@0.5:0.95 on Vsesvit DLA (held-out test set)
self-reported

0.600