docs: bundle README
Browse files
README.md
ADDED
|
@@ -0,0 +1,42 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
library_name: onnxruntime
|
| 3 |
+
license: apache-2.0
|
| 4 |
+
language:
|
| 5 |
+
- vi
|
| 6 |
+
tags:
|
| 7 |
+
- scanindex
|
| 8 |
+
- ocr
|
| 9 |
+
- kie
|
| 10 |
+
- vietnamese
|
| 11 |
+
- model-bundle
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
# ScanIndex — runtime model bundle
|
| 15 |
+
|
| 16 |
+
Small companion repo for [ScanIndex](https://github.com/welcomyou/scanindex). Contains:
|
| 17 |
+
|
| 18 |
+
- `orientation/PP-LCNet_x1_0_doc_ori.onnx` — PaddleOCR's 4-way page-orientation classifier (Apache-2.0; tiny, redistributed for offline-install convenience)
|
| 19 |
+
- `manifest.json` — list of standalone model repos that complete the runtime
|
| 20 |
+
|
| 21 |
+
The actual model weights live in the standalone repos below. Download all of them at once with `scripts/download_offline_models.py` in the GitHub repo.
|
| 22 |
+
|
| 23 |
+
## Standalone model repos
|
| 24 |
+
|
| 25 |
+
| HF repo | Link |
|
| 26 |
+
|---|---|
|
| 27 |
+
| `welcomyou/layoutlmv3-vn-admin-kie` | [layoutlmv3-vn-admin-kie](https://huggingface.co/welcomyou/layoutlmv3-vn-admin-kie) |
|
| 28 |
+
| `welcomyou/e5-small-vn-archive-mix50` | [e5-small-vn-archive-mix50](https://huggingface.co/welcomyou/e5-small-vn-archive-mix50) |
|
| 29 |
+
| `welcomyou/distilled-protonx-vn-correction-ct2` | [distilled-protonx-vn-correction-ct2](https://huggingface.co/welcomyou/distilled-protonx-vn-correction-ct2) |
|
| 30 |
+
| `welcomyou/lightgbm-vn-page-splitter` | [lightgbm-vn-page-splitter](https://huggingface.co/welcomyou/lightgbm-vn-page-splitter) |
|
| 31 |
+
| `welcomyou/doclayout-yolo-onnx-dynamic` | [doclayout-yolo-onnx-dynamic](https://huggingface.co/welcomyou/doclayout-yolo-onnx-dynamic) |
|
| 32 |
+
| `welcomyou/gmft-tatr-onnx` | [gmft-tatr-onnx](https://huggingface.co/welcomyou/gmft-tatr-onnx) |
|
| 33 |
+
| `welcomyou/docling-tableformer-v1-onnx-stepcache` | [docling-tableformer-v1-onnx-stepcache](https://huggingface.co/welcomyou/docling-tableformer-v1-onnx-stepcache) |
|
| 34 |
+
|
| 35 |
+
## Not included (fetched at runtime from upstream)
|
| 36 |
+
|
| 37 |
+
- **Chrome ScreenAI OCR** — `scanindex.core.ocr.screen_ai_downloader` pulls directly from Google CDN to honor the Chrome license.
|
| 38 |
+
- **`BAAI/bge-reranker-v2-m3`** — `sentence_transformers` pulls upstream on first use of the Accurate search mode.
|
| 39 |
+
|
| 40 |
+
## See also
|
| 41 |
+
|
| 42 |
+
[`welcomyou/scanindex` collection](https://huggingface.co/collections/welcomyou/scanindex) groups these models with their upstream lineage.
|