scanindex-models / README.md
welcomyou's picture
docs: bundle README
68e66a8 verified
metadata
library_name: onnxruntime
license: apache-2.0
language:
  - vi
tags:
  - scanindex
  - ocr
  - kie
  - vietnamese
  - model-bundle

ScanIndex — runtime model bundle

Small companion repo for ScanIndex. Contains:

  • orientation/PP-LCNet_x1_0_doc_ori.onnx — PaddleOCR's 4-way page-orientation classifier (Apache-2.0; tiny, redistributed for offline-install convenience)
  • manifest.json — list of standalone model repos that complete the runtime

The actual model weights live in the standalone repos below. Download all of them at once with scripts/download_offline_models.py in the GitHub repo.

Standalone model repos

HF repo Link
welcomyou/layoutlmv3-vn-admin-kie layoutlmv3-vn-admin-kie
welcomyou/e5-small-vn-archive-mix50 e5-small-vn-archive-mix50
welcomyou/distilled-protonx-vn-correction-ct2 distilled-protonx-vn-correction-ct2
welcomyou/lightgbm-vn-page-splitter lightgbm-vn-page-splitter
welcomyou/doclayout-yolo-onnx-dynamic doclayout-yolo-onnx-dynamic
welcomyou/gmft-tatr-onnx gmft-tatr-onnx
welcomyou/docling-tableformer-v1-onnx-stepcache docling-tableformer-v1-onnx-stepcache

Not included (fetched at runtime from upstream)

  • Chrome ScreenAI OCRscanindex.core.ocr.screen_ai_downloader pulls directly from Google CDN to honor the Chrome license.
  • BAAI/bge-reranker-v2-m3sentence_transformers pulls upstream on first use of the Accurate search mode.

See also

welcomyou/scanindex collection groups these models with their upstream lineage.