Vsesvit Document Layout Analysis — YOLOv8m
A fine-tuned YOLOv8m model for document layout analysis of Vsesvit ("Universe"), a Ukrainian Soviet-era literary periodical. Detects and classifies eight layout regions on rasterized periodical page images.
Model Description
The model takes a rasterized page image as input and outputs axis-aligned bounding boxes labeled with one of eight layout classes. It was trained on 974 manually annotated pages across 50 issues of Vsesvit, digitized from chtyvo.org.ua.
Classes
| ID | Class | Description |
|---|---|---|
| 0 | journal_name |
Masthead typography |
| 1 | article_title |
Display-weight article heading |
| 2 | author_name |
By-line attribution |
| 3 | page_number |
Numerals in corner or header |
| 4 | text_block |
Body prose columns |
| 5 | image |
Photographs and illustrations |
| 6 | mixed_text |
Inseparable typographic-graphic composites |
| 7 | decorative_element |
Ornamental borders, rules, vignettes |
Usage
from ultralytics import YOLO
model = YOLO("best_vsesvit.pt")
results = model.predict(
source="page_image.jpg",
imgsz=512,
conf=0.25
)
for result in results:
for box in result.boxes:
cls_id = int(box.cls)
conf = float(box.conf)
xyxy = box.xyxy[0].tolist()
print(f"{result.names[cls_id]}: conf={conf:.2f}, box={xyxy}")
The model was trained at 512×512. Passing higher-resolution images may degrade performance; resize inputs before inference or use imgsz=512 explicitly.
Performance
All metrics are reported on the held-out test set (139 pages, 1,270 instances), stratified by decade of publication and never seen during training or hyperparameter selection.
Aggregate
Per-class
| Class | AP@0.5 | Notes |
|---|---|---|
journal_name |
0.995 | Near-perfect; consistent masthead placement |
image |
0.971 | High visual distinctiveness |
text_block |
0.888 | |
article_title |
0.852 | |
mixed_text |
0.694 | |
decorative_element |
0.633 | Frequently subsumed into adjacent text_block |
author_name |
0.560 | Primary failure mode — see Limitations |
page_number |
— | Omitted; absent from most Vsesvit pages |
Limitations
author_name is the primary failure mode (recall 0.355). By-lines and article titles appear in the same display typeface at similar sizes; only vertical position distinguishes them. This is a fundamental spatial disambiguation problem that purely local-feature detectors cannot reliably resolve. Confidence scores on author_name predictions are characteristically low (0.3–0.4).
Planned mitigation: a spatial post-processing pass using predicted bounding-box vertical position and reading-order heuristics to disambiguate article_title from author_name.
decorative_element recall is also below average (0.632); ornamental borders are frequently merged into adjacent text_block detections.
The model was trained exclusively on Vsesvit (1925–1934). Transfer to other periodicals — especially those with different typographic conventions or languages — has not been evaluated.
Training Details
| Setting | Value |
|---|---|
| Base model | COCO-pretrained YOLOv8m (25.8M params) |
| Input resolution | 512 × 512 |
| Epochs | 100 (best checkpoint: epoch 96) |
| Batch size | 8 |
| Learning rate | 1e-3, cosine decay |
| Hardware | Apple M1 CPU |
| Augmentation | ±5° rotation, HSV value jitter (0.6), mosaic (p=0.5) |
| Train/val/test split | 70% / 15% / 15% by issue, decade-stratified |
Training loss combines CIoU regression loss, binary cross-entropy for classification, and Distribution Focal Loss for box coordinate refinement, with Task-Aligned Assigner target assignment.
Dataset
Annotations cover 974 pages across 50 issues of Vsesvit, produced by the author using a custom browser-based annotation tool (HTML/JavaScript, PDF.js, JSONL and YOLO TXT export). Issues were split by decade to account for substantial shifts in scan quality, typographic conventions, and layout style across the 1925–1934 run.
The annotated dataset is released separately: kgeorgii/vsesvit-dla.
Annotations were produced using a custom browser-based annotation tool, available at kgeorgii.github.io/dla_annotator/ (source).
Citation
If you use this model or dataset, please cite:
@misc{vsesvit-dla-2026,
title = {Document Layout Analysis for Vsesvit: A Ukrainian Soviet-Era Literary Periodical},
author = {[Georgii Korotkov]},
year = {2026},
note = {Model and annotated corpus released at https://huggingface.co/kgeorgii/vsesvit-layout-yolov8m}
}
References
- Aguilar, S. T. (2025). From codicology to code: A comparative study of transformer and YOLO-based detectors for layout analysis in historical documents. arXiv:2506.20326.
- Dutta, A., & Biswas, S. (2019). CNN based extraction of panels/characters from Bengali comic book page images. In Proceedings of the 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), vol. 1, pp. 38–43. IEEE.
- Ermolaev, N., Keenan, T., Reischl, K., Janco, A., & Jacobson, A. (n.d.). Pages of Early Soviet Performance (PESP). Princeton Center for Digital Humanities. https://huggingface.co/datasets/apjanco/pesp
- dos Santos Júnior, E. S., Paixão, T., & Alvarez, A. B. (2025). Comparative performance of YOLOv8, YOLOv9, YOLOv10, and YOLOv11 for layout analysis of historical document images. Applied Sciences, 15(6), 3164. https://doi.org/10.3390/app15063164
- Scotto di Freca, A., D'Alessandro, T., Fontanella, F., Sarria, F., & De Stefano, C. (2026). Character detection using YOLO for writer identification in multiple medieval books. arXiv:2601.04834.
- Downloads last month
- 20
Papers for kgeorgii/vsesvit-layout-yolov8m
From Codicology to Code: A Comparative Study of Transformer and YOLO-based Detectors for Layout Analysis in Historical Documents
Evaluation results
- mAP@0.5 on Vsesvit DLA (held-out test set)self-reported0.799
- mAP@0.5:0.95 on Vsesvit DLA (held-out test set)self-reported0.600