metadata
library_name: onnxruntime
license: apache-2.0
language:
- vi
tags:
- scanindex
- ocr
- kie
- vietnamese
- model-bundle
ScanIndex — runtime model bundle
Small companion repo for ScanIndex. Contains:
orientation/PP-LCNet_x1_0_doc_ori.onnx— PaddleOCR's 4-way page-orientation classifier (Apache-2.0; tiny, redistributed for offline-install convenience)manifest.json— list of standalone model repos that complete the runtime
The actual model weights live in the standalone repos below. Download all of them at once with scripts/download_offline_models.py in the GitHub repo.
Standalone model repos
| HF repo | Link |
|---|---|
welcomyou/layoutlmv3-vn-admin-kie |
layoutlmv3-vn-admin-kie |
welcomyou/e5-small-vn-archive-mix50 |
e5-small-vn-archive-mix50 |
welcomyou/distilled-protonx-vn-correction-ct2 |
distilled-protonx-vn-correction-ct2 |
welcomyou/lightgbm-vn-page-splitter |
lightgbm-vn-page-splitter |
welcomyou/doclayout-yolo-onnx-dynamic |
doclayout-yolo-onnx-dynamic |
welcomyou/gmft-tatr-onnx |
gmft-tatr-onnx |
welcomyou/docling-tableformer-v1-onnx-stepcache |
docling-tableformer-v1-onnx-stepcache |
Not included (fetched at runtime from upstream)
- Chrome ScreenAI OCR —
scanindex.core.ocr.screen_ai_downloaderpulls directly from Google CDN to honor the Chrome license. BAAI/bge-reranker-v2-m3—sentence_transformerspulls upstream on first use of the Accurate search mode.
See also
welcomyou/scanindex collection groups these models with their upstream lineage.