welcomyou commited on
Commit
68e66a8
·
verified ·
1 Parent(s): 6220c3a

docs: bundle README

Browse files
Files changed (1) hide show
  1. README.md +42 -0
README.md ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: onnxruntime
3
+ license: apache-2.0
4
+ language:
5
+ - vi
6
+ tags:
7
+ - scanindex
8
+ - ocr
9
+ - kie
10
+ - vietnamese
11
+ - model-bundle
12
+ ---
13
+
14
+ # ScanIndex — runtime model bundle
15
+
16
+ Small companion repo for [ScanIndex](https://github.com/welcomyou/scanindex). Contains:
17
+
18
+ - `orientation/PP-LCNet_x1_0_doc_ori.onnx` — PaddleOCR's 4-way page-orientation classifier (Apache-2.0; tiny, redistributed for offline-install convenience)
19
+ - `manifest.json` — list of standalone model repos that complete the runtime
20
+
21
+ The actual model weights live in the standalone repos below. Download all of them at once with `scripts/download_offline_models.py` in the GitHub repo.
22
+
23
+ ## Standalone model repos
24
+
25
+ | HF repo | Link |
26
+ |---|---|
27
+ | `welcomyou/layoutlmv3-vn-admin-kie` | [layoutlmv3-vn-admin-kie](https://huggingface.co/welcomyou/layoutlmv3-vn-admin-kie) |
28
+ | `welcomyou/e5-small-vn-archive-mix50` | [e5-small-vn-archive-mix50](https://huggingface.co/welcomyou/e5-small-vn-archive-mix50) |
29
+ | `welcomyou/distilled-protonx-vn-correction-ct2` | [distilled-protonx-vn-correction-ct2](https://huggingface.co/welcomyou/distilled-protonx-vn-correction-ct2) |
30
+ | `welcomyou/lightgbm-vn-page-splitter` | [lightgbm-vn-page-splitter](https://huggingface.co/welcomyou/lightgbm-vn-page-splitter) |
31
+ | `welcomyou/doclayout-yolo-onnx-dynamic` | [doclayout-yolo-onnx-dynamic](https://huggingface.co/welcomyou/doclayout-yolo-onnx-dynamic) |
32
+ | `welcomyou/gmft-tatr-onnx` | [gmft-tatr-onnx](https://huggingface.co/welcomyou/gmft-tatr-onnx) |
33
+ | `welcomyou/docling-tableformer-v1-onnx-stepcache` | [docling-tableformer-v1-onnx-stepcache](https://huggingface.co/welcomyou/docling-tableformer-v1-onnx-stepcache) |
34
+
35
+ ## Not included (fetched at runtime from upstream)
36
+
37
+ - **Chrome ScreenAI OCR** — `scanindex.core.ocr.screen_ai_downloader` pulls directly from Google CDN to honor the Chrome license.
38
+ - **`BAAI/bge-reranker-v2-m3`** — `sentence_transformers` pulls upstream on first use of the Accurate search mode.
39
+
40
+ ## See also
41
+
42
+ [`welcomyou/scanindex` collection](https://huggingface.co/collections/welcomyou/scanindex) groups these models with their upstream lineage.