File size: 2,170 Bytes
68e66a8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
---

library_name: onnxruntime
license: apache-2.0
language:
  - vi
tags:
  - scanindex
  - ocr
  - kie
  - vietnamese
  - model-bundle
---


# ScanIndex — runtime model bundle

Small companion repo for [ScanIndex](https://github.com/welcomyou/scanindex). Contains:

- `orientation/PP-LCNet_x1_0_doc_ori.onnx` — PaddleOCR's 4-way page-orientation classifier (Apache-2.0; tiny, redistributed for offline-install convenience)
- `manifest.json` — list of standalone model repos that complete the runtime

The actual model weights live in the standalone repos below. Download all of them at once with `scripts/download_offline_models.py` in the GitHub repo.

## Standalone model repos

| HF repo | Link |
|---|---|
| `welcomyou/layoutlmv3-vn-admin-kie` | [layoutlmv3-vn-admin-kie](https://huggingface.co/welcomyou/layoutlmv3-vn-admin-kie) |
| `welcomyou/e5-small-vn-archive-mix50` | [e5-small-vn-archive-mix50](https://huggingface.co/welcomyou/e5-small-vn-archive-mix50) |
| `welcomyou/distilled-protonx-vn-correction-ct2` | [distilled-protonx-vn-correction-ct2](https://huggingface.co/welcomyou/distilled-protonx-vn-correction-ct2) |
| `welcomyou/lightgbm-vn-page-splitter` | [lightgbm-vn-page-splitter](https://huggingface.co/welcomyou/lightgbm-vn-page-splitter) |
| `welcomyou/doclayout-yolo-onnx-dynamic` | [doclayout-yolo-onnx-dynamic](https://huggingface.co/welcomyou/doclayout-yolo-onnx-dynamic) |
| `welcomyou/gmft-tatr-onnx` | [gmft-tatr-onnx](https://huggingface.co/welcomyou/gmft-tatr-onnx) |
| `welcomyou/docling-tableformer-v1-onnx-stepcache` | [docling-tableformer-v1-onnx-stepcache](https://huggingface.co/welcomyou/docling-tableformer-v1-onnx-stepcache) |

## Not included (fetched at runtime from upstream)

- **Chrome ScreenAI OCR**`scanindex.core.ocr.screen_ai_downloader` pulls directly from Google CDN to honor the Chrome license.
- **`BAAI/bge-reranker-v2-m3`**`sentence_transformers` pulls upstream on first use of the Accurate search mode.

## See also

[`welcomyou/scanindex` collection](https://huggingface.co/collections/welcomyou/scanindex) groups these models with their upstream lineage.