Vietnamese ID Document Detection Model
Abstract
This repository contains a Detectron2-based object detection model trained to localize Vietnamese identity documents in images. The detector predicts a single class, document, and is intended as a document-localization stage before downstream processing such as perspective correction, background removal, or OCR.
Model
- Architecture: Faster R-CNN R50 FPN 3x
- Framework: Detectron2
- Detection class:
document - Input: natural images containing Vietnamese identity documents
- Output: bounding box coordinates with confidence scores
Training Data
The model was trained on a normalized COCO-format dataset merged from 6 local sources:
archivebehind cccd.v6i.cocoCCCD Project.v2i.cococccd.v2i.cocodetect cccd.v2i.cocoProject2.v2i.coco
Merged dataset summary:
- Train split: 3354 images / 3347 annotations
- Empty train images filtered by the trainer: 21
- Effective annotated train images used for optimization: 3333
- Validation split: 458 images / 452 annotations
Training Setup
- Base learning rate:
0.00025 - Maximum iterations:
18000 - Learning-rate decay milestones:
14400,16200 - Batch size:
2 - Multi-scale train resize:
640,672,704,736,768,800 - Max image size:
1333
Evaluation
The final validation metrics below are from the last Detectron2 evaluation at iteration 18000.
| Metric | Value |
|---|---|
| AP (bbox) | 95.151 |
| AP50 (bbox) | 98.101 |
| AP75 (bbox) | 98.086 |
| APl (bbox) | 95.151 |
Raw training and evaluation logs are included in metrics.json.
Intended Use
This model is designed for document localization, especially as a preprocessing step before cropping or OCR. It is optimized for finding the document region, not for classifying document side, document type, or extracting text content.
Notes
This repository stores the model artifacts and model card. Training and inference scripts are maintained in the source project repository used to train the model. A TorchScript export artifact may also be included alongside the PyTorch checkpoint.
- Downloads last month
- -