Vsesvit Document Layout Analysis — YOLOv8m

A fine-tuned YOLOv8m model for document layout analysis of Vsesvit ("Universe"), a Ukrainian Soviet-era literary periodical. Detects and classifies eight layout regions on rasterized periodical page images.


Model Description

The model takes a rasterized page image as input and outputs axis-aligned bounding boxes labeled with one of eight layout classes. It was trained on 974 manually annotated pages across 50 issues of Vsesvit, digitized from chtyvo.org.ua.


Classes

ID Class Description
0 journal_name Masthead typography
1 article_title Display-weight article heading
2 author_name By-line attribution
3 page_number Numerals in corner or header
4 text_block Body prose columns
5 image Photographs and illustrations
6 mixed_text Inseparable typographic-graphic composites
7 decorative_element Ornamental borders, rules, vignettes

Usage

from ultralytics import YOLO

model = YOLO("best_vsesvit.pt")

results = model.predict(
    source="page_image.jpg",
    imgsz=512,
    conf=0.25
)

for result in results:
    for box in result.boxes:
        cls_id = int(box.cls)
        conf = float(box.conf)
        xyxy = box.xyxy[0].tolist()
        print(f"{result.names[cls_id]}: conf={conf:.2f}, box={xyxy}")

The model was trained at 512×512. Passing higher-resolution images may degrade performance; resize inputs before inference or use imgsz=512 explicitly.


Performance

All metrics are reported on the held-out test set (139 pages, 1,270 instances), stratified by decade of publication and never seen during training or hyperparameter selection.

Aggregate

Precision Recall mAP@0.5 mAP@0.5:0.95
0.847 0.761 0.799 0.600

Per-class

Class AP@0.5 Notes
journal_name 0.995 Near-perfect; consistent masthead placement
image 0.971 High visual distinctiveness
text_block 0.888
article_title 0.852
mixed_text 0.694
decorative_element 0.633 Frequently subsumed into adjacent text_block
author_name 0.560 Primary failure mode — see Limitations
page_number Omitted; absent from most Vsesvit pages

Limitations

author_name is the primary failure mode (recall 0.355). By-lines and article titles appear in the same display typeface at similar sizes; only vertical position distinguishes them. This is a fundamental spatial disambiguation problem that purely local-feature detectors cannot reliably resolve. Confidence scores on author_name predictions are characteristically low (0.3–0.4).

Planned mitigation: a spatial post-processing pass using predicted bounding-box vertical position and reading-order heuristics to disambiguate article_title from author_name.

decorative_element recall is also below average (0.632); ornamental borders are frequently merged into adjacent text_block detections.

The model was trained exclusively on Vsesvit (1925–1934). Transfer to other periodicals — especially those with different typographic conventions or languages — has not been evaluated.


Training Details

Setting Value
Base model COCO-pretrained YOLOv8m (25.8M params)
Input resolution 512 × 512
Epochs 100 (best checkpoint: epoch 96)
Batch size 8
Learning rate 1e-3, cosine decay
Hardware Apple M1 CPU
Augmentation ±5° rotation, HSV value jitter (0.6), mosaic (p=0.5)
Train/val/test split 70% / 15% / 15% by issue, decade-stratified

Training loss combines CIoU regression loss, binary cross-entropy for classification, and Distribution Focal Loss for box coordinate refinement, with Task-Aligned Assigner target assignment.


Dataset

Annotations cover 974 pages across 50 issues of Vsesvit, produced by the author using a custom browser-based annotation tool (HTML/JavaScript, PDF.js, JSONL and YOLO TXT export). Issues were split by decade to account for substantial shifts in scan quality, typographic conventions, and layout style across the 1925–1934 run.

The annotated dataset is released separately: kgeorgii/vsesvit-dla.

Annotations were produced using a custom browser-based annotation tool, available at kgeorgii.github.io/dla_annotator/ (source).


Citation

If you use this model or dataset, please cite:

@misc{vsesvit-dla-2026,
  title     = {Document Layout Analysis for Vsesvit: A Ukrainian Soviet-Era Literary Periodical},
  author    = {[Georgii Korotkov]},
  year      = {2026},
  note      = {Model and annotated corpus released at https://huggingface.co/kgeorgii/vsesvit-layout-yolov8m}
}

References

  • Aguilar, S. T. (2025). From codicology to code: A comparative study of transformer and YOLO-based detectors for layout analysis in historical documents. arXiv:2506.20326.
  • Dutta, A., & Biswas, S. (2019). CNN based extraction of panels/characters from Bengali comic book page images. In Proceedings of the 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), vol. 1, pp. 38–43. IEEE.
  • Ermolaev, N., Keenan, T., Reischl, K., Janco, A., & Jacobson, A. (n.d.). Pages of Early Soviet Performance (PESP). Princeton Center for Digital Humanities. https://huggingface.co/datasets/apjanco/pesp
  • dos Santos Júnior, E. S., Paixão, T., & Alvarez, A. B. (2025). Comparative performance of YOLOv8, YOLOv9, YOLOv10, and YOLOv11 for layout analysis of historical document images. Applied Sciences, 15(6), 3164. https://doi.org/10.3390/app15063164
  • Scotto di Freca, A., D'Alessandro, T., Fontanella, F., Sarria, F., & De Stefano, C. (2026). Character detection using YOLO for writer identification in multiple medieval books. arXiv:2601.04834.
Downloads last month
20
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for kgeorgii/vsesvit-layout-yolov8m

Evaluation results