ScanIndex β€” runtime model bundle

Small companion repo for ScanIndex. Contains:

  • orientation/PP-LCNet_x1_0_doc_ori.onnx β€” PaddleOCR's 4-way page-orientation classifier (Apache-2.0; tiny, redistributed for offline-install convenience)
  • manifest.json β€” list of standalone model repos that complete the runtime

The actual model weights live in the standalone repos below. Download all of them at once with scripts/download_offline_models.py in the GitHub repo.

Standalone model repos

HF repo Link
welcomyou/layoutlmv3-vn-admin-kie layoutlmv3-vn-admin-kie
welcomyou/e5-small-vn-archive-mix50 e5-small-vn-archive-mix50
welcomyou/distilled-protonx-vn-correction-ct2 distilled-protonx-vn-correction-ct2
welcomyou/lightgbm-vn-page-splitter lightgbm-vn-page-splitter
welcomyou/doclayout-yolo-onnx-dynamic doclayout-yolo-onnx-dynamic
welcomyou/gmft-tatr-onnx gmft-tatr-onnx
welcomyou/docling-tableformer-v1-onnx-stepcache docling-tableformer-v1-onnx-stepcache

Not included (fetched at runtime from upstream)

  • Chrome ScreenAI OCR β€” scanindex.core.ocr.screen_ai_downloader pulls directly from Google CDN to honor the Chrome license.
  • BAAI/bge-reranker-v2-m3 β€” sentence_transformers pulls upstream on first use of the Accurate search mode.

See also

welcomyou/scanindex collection groups these models with their upstream lineage.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including welcomyou/scanindex-models