ScanIndex
Collection
Models loaded by https://github.com/welcomyou/scanindex β OCR, KIE, layout, tables, embedder for Vietnamese admin docs. β’ 8 items β’ Updated
Small companion repo for ScanIndex. Contains:
orientation/PP-LCNet_x1_0_doc_ori.onnx β PaddleOCR's 4-way page-orientation classifier (Apache-2.0; tiny, redistributed for offline-install convenience)manifest.json β list of standalone model repos that complete the runtimeThe actual model weights live in the standalone repos below. Download all of them at once with scripts/download_offline_models.py in the GitHub repo.
| HF repo | Link |
|---|---|
welcomyou/layoutlmv3-vn-admin-kie |
layoutlmv3-vn-admin-kie |
welcomyou/e5-small-vn-archive-mix50 |
e5-small-vn-archive-mix50 |
welcomyou/distilled-protonx-vn-correction-ct2 |
distilled-protonx-vn-correction-ct2 |
welcomyou/lightgbm-vn-page-splitter |
lightgbm-vn-page-splitter |
welcomyou/doclayout-yolo-onnx-dynamic |
doclayout-yolo-onnx-dynamic |
welcomyou/gmft-tatr-onnx |
gmft-tatr-onnx |
welcomyou/docling-tableformer-v1-onnx-stepcache |
docling-tableformer-v1-onnx-stepcache |
scanindex.core.ocr.screen_ai_downloader pulls directly from Google CDN to honor the Chrome license.BAAI/bge-reranker-v2-m3 β sentence_transformers pulls upstream on first use of the Accurate search mode.welcomyou/scanindex collection groups these models with their upstream lineage.