emelryan commited on
Commit
a2c14e2
·
1 Parent(s): b28505d

merged english and multilingual and updated model card

Browse files
README.md CHANGED
@@ -40,8 +40,7 @@ This model is ready for commercial use.
40
  The use of this model is governed by the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/) and the use of the post-processing scripts are licensed under [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0.txt).
41
 
42
  ### Release Date: <br>
43
- Hugging Face (this repo) [nvidia/nemotron-ocr-v2-multilingual](https://huggingface.co/nvidia/nemotron-ocr-v2-multilingual) <br>
44
- Collection / variant hub: [nvidia/nemotron-ocr-v2](https://huggingface.co/nvidia/nemotron-ocr-v2) <br>
45
  Build.Nvidia.com 04/15/2026 via [https://build.nvidia.com/nvidia/nemotron-ocr-v2](https://build.nvidia.com/nvidia/nemotron-ocr-v2) <br>
46
  NGC 04/15/2026 via [https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo-microservices/containers/nemoretriever-ocr-v2](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo-microservices/containers/nemoretriever-ocr-v2) <br>
47
 
@@ -59,8 +58,8 @@ Global
59
 
60
  Nemotron OCR v2 is available in two variants:
61
 
62
- - **v2_english** — Optimized for English-language OCR with a compact recognizer for lower latency.
63
- - **v2_multilingual** — Supports English, Chinese (Simplified and Traditional), Japanese, Korean, and Russian with a larger recognizer to accommodate the expanded character set.
64
 
65
  Both variants share the same three-component architecture:
66
 
@@ -96,7 +95,7 @@ The two variants share an identical detector and relational architecture but dif
96
  | Relational model | 2,255,419 |
97
  | **Total** | **53,831,335** |
98
 
99
- **v2_multilingual** (this repository: `checkpoints/`):
100
 
101
  | Component | Parameters |
102
  |-------------------|-------------|
@@ -218,25 +217,26 @@ Output is saved next to your input image as `<name>-annotated.<ext>` on the host
218
 
219
  3. Run the model using the following code.
220
 
221
- Use `nemotron_ocr.inference.pipeline.NemotronOCR`. With no arguments, checkpoints are downloaded from Hugging Face: **by default** the **v2 multilingual** bundle ([`nvidia/nemotron-ocr-v2-multilingual`](https://huggingface.co/nvidia/nemotron-ocr-v2-multilingual), `checkpoints/`). Use `lang="en"` for the English-optimized v2 build (`nvidia/nemotron-ocr-v2` / `v2_english/`), or pass `model_dir` to load from disk (any complete checkpoint folder; `lang` is then ignored).
222
 
223
  ```python
224
- from nemotron_ocr.inference.pipeline import NemotronOCR
225
 
226
  # Default: Hugging Face v2 multilingual
227
- ocr = NemotronOCR()
228
 
229
- # English-optimized v2 (Hub)
230
- ocr_en = NemotronOCR(lang="en")
231
 
232
- # Multilingual v2 explicitly (same default as NemotronOCR())
233
- ocr_multi = NemotronOCR(lang="multi")
 
234
 
235
- # Local directory with detector.pth, recognizer.pth, relational.pth, charset.txt (this repo: ./checkpoints)
236
- ocr_local = NemotronOCR(model_dir="./checkpoints")
237
 
238
  # Legacy v1 weights from Hub (optional)
239
- ocr_v1 = NemotronOCR(lang="v1")
240
 
241
  predictions = ocr("ocr-example-input-1.png")
242
 
@@ -251,7 +251,7 @@ for pred in predictions:
251
  **Constructor rules**
252
 
253
  - **`model_dir`**: If it contains all four checkpoint files, that directory is used and **`lang` is ignored**.
254
- - **`lang`** (keyword only): When weights are fetched from the Hub — `None` or `"multi"` / `"multilingual"` → [nvidia/nemotron-ocr-v2-multilingual](https://huggingface.co/nvidia/nemotron-ocr-v2-multilingual) `checkpoints/` (default); `"en"` / `"english"` → `nvidia/nemotron-ocr-v2` / `v2_english/`; `"v1"` / `"legacy"` → original v1 layout on `nvidia/nemotron-ocr-v1`.
255
  - If `model_dir` is set but incomplete, the client falls back to a Hub download using **`lang`** (defaulting to v2 multilingual when `lang` is `None`).
256
 
257
  ### Software Integration
@@ -270,8 +270,8 @@ for pred in predictions:
270
 
271
  ## Model Version(s)
272
 
273
- * **This repository:** Nemotron OCR **v2 multilingual** (`checkpoints/`).
274
- * **Related:** [nvidia/nemotron-ocr-v2](https://huggingface.co/nvidia/nemotron-ocr-v2) hosts the **v2 English** variant (`v2_english/`) and collection metadata.
275
 
276
  ## **Training and Evaluation Datasets:**
277
 
@@ -309,23 +309,23 @@ Tables below are **reference metrics** from NVIDIA’s benchmark runs (OmniDocBe
309
 
310
  Normalized Edit Distance (NED) sample_avg on OmniDocBench (lower = better). Results follow OmniDocBench methodology (empty predictions skipped). All models evaluated in crop mode. Speed measured on a single A100 GPU.
311
 
312
- | Model | crops/s | pages/s | EN | ZH | Mixed | White | Single | Multi | Normal | Rotate90 | Rotate270 | Horizontal |
313
- | :--- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
314
- | PaddleOCR v5 (server) | 20.6 | 1.2 | 0.027 | 0.037 | 0.041 | 0.031 | 0.035 | 0.064 | 0.031 | 0.116 | 0.897 | 0.027 |
315
- | OpenOCR (server) | 17.4 | 1.5 | 0.024 | 0.033 | 0.049 | 0.027 | 0.034 | 0.061 | 0.028 | 0.042 | 0.761 | 0.034 |
316
- | **Nemotron OCR v2(Multilingual)** | **68.1** | **21.8** | **0.048** | **0.072** | **0.142** | **0.061** | **0.049** | **0.117** | **0.062** | **0.109** | **0.332** | **0.372** |
317
- | *Nemotron OCR v2 (EN)* | *74.6* | *19.9* | *0.038* | *0.830* | *0.437* | *0.348* | *0.282* | *0.572* | *0.353* | *0.232* | *0.827* | *0.893* |
318
- | EasyOCR | 10.3 | 0.4 | 0.095 | 0.117 | 0.326 | 0.095 | 0.179 | 0.322 | 0.110 | 0.987 | 0.979 | 0.809 |
319
- | Tesseract-OCR | | | 0.096 | 0.551 | 0.250 | 0.439 | 0.328 | 0.331 | 0.426 | 0.117 | 0.969 | 0.984 |
320
- | *Nemotron OCR v1* | *61.1* | *21.4* | *0.038* | *0.876* | *0.436* | *0.472* | *0.434* | *0.715* | *0.482* | *0.358* | *0.871* | *0.979* |
321
 
322
- Column key: **crops/s** and **pages/s** are throughput using the v2 batched pipeline where measured; **EN** = English, **ZH** = Simplified Chinese, **Mixed** = English/Chinese mixed, **White/Single/Multi** = background type, **Normal/Rotate90/Rotate270/Horizontal** = text orientation.
323
 
324
  #### [SynthDoG](https://github.com/clovaai/donut/tree/master/synthdog) Generated Benchmark Data
325
 
326
  Normalized Edit Distance (NED) page_avg on [SynthDoG](https://github.com/clovaai/donut/tree/master/synthdog) generated benchmark data (lower = better):
327
 
328
- | Language | PaddleOCR (base) | PaddleOCR (specialized) | OpenOCR (server) | Nemotron OCR v1 | *Nemotron OCR v2 (EN)* | **Nemotron OCR v2** |
329
  | :--- | ---: | ---: | ---: | ---: | ---: | ---: |
330
  | English | 0.117 | 0.096 | 0.105 | 0.078 | *0.079* | **0.069** |
331
  | Japanese | 0.201 | 0.201 | 0.586 | 0.723 | *0.765* | **0.046** |
 
40
  The use of this model is governed by the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/) and the use of the post-processing scripts are licensed under [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0.txt).
41
 
42
  ### Release Date: <br>
43
+ Hugging Face (this repo): [nvidia/nemotron-ocr-v2](https://huggingface.co/nvidia/nemotron-ocr-v2) <br>
 
44
  Build.Nvidia.com 04/15/2026 via [https://build.nvidia.com/nvidia/nemotron-ocr-v2](https://build.nvidia.com/nvidia/nemotron-ocr-v2) <br>
45
  NGC 04/15/2026 via [https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo-microservices/containers/nemoretriever-ocr-v2](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo-microservices/containers/nemoretriever-ocr-v2) <br>
46
 
 
58
 
59
  Nemotron OCR v2 is available in two variants:
60
 
61
+ - **v2_english** — Optimized for English-language OCR with word-level region handling.
62
+ - **v2_multilingual** — Supports English, Chinese (Simplified and Traditional), Japanese, Korean, and Russian with line-level region handling for multilingual documents.
63
 
64
  Both variants share the same three-component architecture:
65
 
 
95
  | Relational model | 2,255,419 |
96
  | **Total** | **53,831,335** |
97
 
98
+ **v2_multilingual** (from `v2_multilingual/`):
99
 
100
  | Component | Parameters |
101
  |-------------------|-------------|
 
217
 
218
  3. Run the model using the following code.
219
 
220
+ Use `nemotron_ocr.inference.pipeline_v2.NemotronOCRV2`. With no arguments, checkpoints are downloaded from Hugging Face: **by default** the **v2 multilingual** bundle (`nvidia/nemotron-ocr-v2` / `v2_multilingual/`). Use `lang="en"` for the English v2 build (`nvidia/nemotron-ocr-v2` / `v2_english/`), or pass `model_dir` to load from disk (any complete checkpoint folder; `lang` is then ignored).
221
 
222
  ```python
223
+ from nemotron_ocr.inference.pipeline_v2 import NemotronOCRV2
224
 
225
  # Default: Hugging Face v2 multilingual
226
+ ocr = NemotronOCRV2()
227
 
228
+ # English v2 (Hub, word-level)
229
+ ocr_en = NemotronOCRV2(lang="en")
230
 
231
+ # Multilingual v2 explicitly (same default as NemotronOCRV2())
232
+ # Uses the line-level variant.
233
+ ocr_multi = NemotronOCRV2(lang="multi")
234
 
235
+ # Local directory with detector.pth, recognizer.pth, relational.pth, charset.txt
236
+ ocr_local = NemotronOCRV2(model_dir="./v2_multilingual")
237
 
238
  # Legacy v1 weights from Hub (optional)
239
+ ocr_v1 = NemotronOCRV2(lang="v1")
240
 
241
  predictions = ocr("ocr-example-input-1.png")
242
 
 
251
  **Constructor rules**
252
 
253
  - **`model_dir`**: If it contains all four checkpoint files, that directory is used and **`lang` is ignored**.
254
+ - **`lang`** (keyword only): When weights are fetched from the Hub — `None` or `"multi"` / `"multilingual"` → `nvidia/nemotron-ocr-v2` / `v2_multilingual/` (default); `"en"` / `"english"` → `nvidia/nemotron-ocr-v2` / `v2_english/`; `"v1"` / `"legacy"` → original v1 layout on `nvidia/nemotron-ocr-v1`.
255
  - If `model_dir` is set but incomplete, the client falls back to a Hub download using **`lang`** (defaulting to v2 multilingual when `lang` is `None`).
256
 
257
  ### Software Integration
 
270
 
271
  ## Model Version(s)
272
 
273
+ * **This repository:** Nemotron OCR v2 with both variants: `v2_english/` and `v2_multilingual/`.
274
+ * **Hugging Face Hub:** [nvidia/nemotron-ocr-v2](https://huggingface.co/nvidia/nemotron-ocr-v2).
275
 
276
  ## **Training and Evaluation Datasets:**
277
 
 
309
 
310
  Normalized Edit Distance (NED) sample_avg on OmniDocBench (lower = better). Results follow OmniDocBench methodology (empty predictions skipped). All models evaluated in crop mode. Speed measured on a single A100 GPU.
311
 
312
+ | Model | pages/s | EN | ZH | Mixed | White | Single | Multi | Normal | Rotate90 | Rotate270 | Horizontal |
313
+ | :--- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
314
+ | PaddleOCR v5 (server) | 1.2 | 0.027 | 0.037 | 0.041 | 0.031 | 0.035 | 0.064 | 0.031 | 0.116 | 0.897 | 0.027 |
315
+ | OpenOCR (server) | 1.5 | 0.024 | 0.033 | 0.049 | 0.027 | 0.034 | 0.061 | 0.028 | 0.042 | 0.761 | 0.034 |
316
+ | **Nemotron OCR v2 (multilingual)** | **21.8** | **0.048** | **0.072** | **0.142** | **0.061** | **0.049** | **0.117** | **0.062** | **0.109** | **0.332** | **0.372** |
317
+ | *Nemotron OCR v2 (EN)* | *19.9* | *0.038* | *0.830* | *0.437* | *0.348* | *0.282* | *0.572* | *0.353* | *0.232* | *0.827* | *0.893* |
318
+ | EasyOCR | 0.4 | 0.095 | 0.117 | 0.326 | 0.095 | 0.179 | 0.322 | 0.110 | 0.987 | 0.979 | 0.809 |
319
+ | Tesseract-OCR | | 0.096 | 0.551 | 0.250 | 0.439 | 0.328 | 0.331 | 0.426 | 0.117 | 0.969 | 0.984 |
320
+ | *Nemotron OCR v1* | *21.4* | *0.038* | *0.876* | *0.436* | *0.472* | *0.434* | *0.715* | *0.482* | *0.358* | *0.871* | *0.979* |
321
 
322
+ Column key: **pages/s** is throughput using the v2 batched pipeline where measured; **EN** = English, **ZH** = Simplified Chinese, **Mixed** = English/Chinese mixed, **White/Single/Multi** = background type, **Normal/Rotate90/Rotate270/Horizontal** = text orientation.
323
 
324
  #### [SynthDoG](https://github.com/clovaai/donut/tree/master/synthdog) Generated Benchmark Data
325
 
326
  Normalized Edit Distance (NED) page_avg on [SynthDoG](https://github.com/clovaai/donut/tree/master/synthdog) generated benchmark data (lower = better):
327
 
328
+ | Language | PaddleOCR (base) | PaddleOCR (specialized) | OpenOCR (server) | Nemotron OCR v1 | *Nemotron OCR v2 (EN)* | **Nemotron OCR v2 (multilingual)** |
329
  | :--- | ---: | ---: | ---: | ---: | ---: | ---: |
330
  | English | 0.117 | 0.096 | 0.105 | 0.078 | *0.079* | **0.069** |
331
  | Japanese | 0.201 | 0.201 | 0.586 | 0.723 | *0.765* | **0.046** |
nemotron-ocr/src/nemotron_ocr/inference/pipeline.py CHANGED
@@ -41,22 +41,19 @@ DEFAULT_MERGE_LEVEL = "paragraph"
41
 
42
  # HuggingFace repositories for downloading model weights
43
  HF_REPO_ID = "nvidia/nemotron-ocr-v1"
44
- # Monorepo with per-variant folders (English under ``v2_english/``)
45
  HF_REPO_ID_V2 = "nvidia/nemotron-ocr-v2"
46
- # Multilingual weights live in this repo under ``checkpoints/`` (see Hugging Face layout)
47
- HF_REPO_ID_V2_MULTILINGUAL = "nvidia/nemotron-ocr-v2-multilingual"
48
  CHECKPOINT_FILES = ["detector.pth", "recognizer.pth", "relational.pth", "charset.txt"]
49
 
50
- # User-facing ``lang`` (see NemotronOCR ``lang``) → (repo_id, path prefix inside repo)
51
  LANG_HUB_PATH: Dict[str, Tuple[str, str]] = {
52
  "en": (HF_REPO_ID_V2, "v2_english"),
53
  "english": (HF_REPO_ID_V2, "v2_english"),
54
- "multi": (HF_REPO_ID_V2_MULTILINGUAL, "checkpoints"),
55
- "multilingual": (HF_REPO_ID_V2_MULTILINGUAL, "checkpoints"),
56
  "v1": (HF_REPO_ID, "checkpoints"),
57
  "legacy": (HF_REPO_ID, "checkpoints"),
58
  }
59
- DEFAULT_LANG = "multi" # v2 multilingual checkpoint from HF_REPO_ID_V2_MULTILINGUAL
60
 
61
 
62
  class NemotronOCR:
@@ -65,7 +62,7 @@ class NemotronOCR:
65
 
66
  Model weights are automatically downloaded from Hugging Face Hub when no
67
  complete local checkpoint directory is provided. The default is Nemotron OCR
68
- **v2 multilingual** (``nvidia/nemotron-ocr-v2-multilingual`` / ``checkpoints``).
69
 
70
  Automatically detects model parameters from model_config.json if available,
71
  otherwise falls back to defaults for backwards compatibility.
@@ -78,10 +75,11 @@ class NemotronOCR:
78
  being fed to the detector. When None the value is read from
79
  ``model_config.json`` (key ``infer_length``), falling back to 1024.
80
  lang: Which checkpoint to fetch from Hugging Face when ``model_dir`` is
81
- missing or incomplete: ``"en"`` / ``"english"`` (v2 English), ``"multi"`` /
82
- ``"multilingual"`` (v2 multilingual, same as the default), or ``"v1"`` /
83
- ``"legacy"`` (original v1 Hub layout). When ``None``, **v2 multilingual**
84
- is downloaded.
 
85
  """
86
 
87
  def __init__(
 
41
 
42
  # HuggingFace repositories for downloading model weights
43
  HF_REPO_ID = "nvidia/nemotron-ocr-v1"
 
44
  HF_REPO_ID_V2 = "nvidia/nemotron-ocr-v2"
 
 
45
  CHECKPOINT_FILES = ["detector.pth", "recognizer.pth", "relational.pth", "charset.txt"]
46
 
47
+ # User-facing ``lang`` → (repo_id, path prefix inside repo)
48
  LANG_HUB_PATH: Dict[str, Tuple[str, str]] = {
49
  "en": (HF_REPO_ID_V2, "v2_english"),
50
  "english": (HF_REPO_ID_V2, "v2_english"),
51
+ "multi": (HF_REPO_ID_V2, "v2_multilingual"),
52
+ "multilingual": (HF_REPO_ID_V2, "v2_multilingual"),
53
  "v1": (HF_REPO_ID, "checkpoints"),
54
  "legacy": (HF_REPO_ID, "checkpoints"),
55
  }
56
+ DEFAULT_LANG = "multi"
57
 
58
 
59
  class NemotronOCR:
 
62
 
63
  Model weights are automatically downloaded from Hugging Face Hub when no
64
  complete local checkpoint directory is provided. The default is Nemotron OCR
65
+ **v2 multilingual** (``nvidia/nemotron-ocr-v2`` / ``v2_multilingual``).
66
 
67
  Automatically detects model parameters from model_config.json if available,
68
  otherwise falls back to defaults for backwards compatibility.
 
75
  being fed to the detector. When None the value is read from
76
  ``model_config.json`` (key ``infer_length``), falling back to 1024.
77
  lang: Which checkpoint to fetch from Hugging Face when ``model_dir`` is
78
+ missing or incomplete: ``"en"`` / ``"english"`` (v2 English from
79
+ ``nvidia/nemotron-ocr-v2`` / ``v2_english``), ``"multi"`` / ``"multilingual"``
80
+ (v2 multilingual from ``nvidia/nemotron-ocr-v2`` / ``v2_multilingual``, the
81
+ default), or ``"v1"`` / ``"legacy"`` (original v1 Hub layout).
82
+ When ``None``, **v2 multilingual** is downloaded.
83
  """
84
 
85
  def __init__(
v2_english/charset.txt ADDED
@@ -0,0 +1,857 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ " ",
3
+ "!",
4
+ "\"",
5
+ "#",
6
+ "$",
7
+ "%",
8
+ "&",
9
+ "'",
10
+ "(",
11
+ ")",
12
+ "*",
13
+ "+",
14
+ ",",
15
+ "-",
16
+ ".",
17
+ "/",
18
+ "0",
19
+ "1",
20
+ "2",
21
+ "3",
22
+ "4",
23
+ "5",
24
+ "6",
25
+ "7",
26
+ "8",
27
+ "9",
28
+ ":",
29
+ ";",
30
+ "<",
31
+ "=",
32
+ ">",
33
+ "?",
34
+ "@",
35
+ "A",
36
+ "B",
37
+ "C",
38
+ "D",
39
+ "E",
40
+ "F",
41
+ "FI",
42
+ "G",
43
+ "H",
44
+ "I",
45
+ "İ",
46
+ "J",
47
+ "K",
48
+ "L",
49
+ "M",
50
+ "N",
51
+ "O",
52
+ "P",
53
+ "Q",
54
+ "R",
55
+ "S",
56
+ "SS",
57
+ "T",
58
+ "U",
59
+ "V",
60
+ "W",
61
+ "X",
62
+ "Y",
63
+ "Z",
64
+ "[",
65
+ "\\",
66
+ "]",
67
+ "^",
68
+ "_",
69
+ "`",
70
+ "a",
71
+ "b",
72
+ "c",
73
+ "d",
74
+ "e",
75
+ "f",
76
+ "fi",
77
+ "g",
78
+ "h",
79
+ "i",
80
+ "i̇",
81
+ "j",
82
+ "k",
83
+ "l",
84
+ "m",
85
+ "n",
86
+ "o",
87
+ "p",
88
+ "q",
89
+ "r",
90
+ "s",
91
+ "ss",
92
+ "t",
93
+ "u",
94
+ "v",
95
+ "w",
96
+ "x",
97
+ "y",
98
+ "z",
99
+ "{",
100
+ "|",
101
+ "}",
102
+ "~",
103
+ "²",
104
+ "³",
105
+ "µ",
106
+ "¹",
107
+ "º",
108
+ "À",
109
+ "Á",
110
+ "Â",
111
+ "Ã",
112
+ "Ä",
113
+ "Å",
114
+ "Æ",
115
+ "Ç",
116
+ "È",
117
+ "É",
118
+ "Ê",
119
+ "Ë",
120
+ "Ì",
121
+ "Í",
122
+ "Î",
123
+ "Ï",
124
+ "Ð",
125
+ "Ñ",
126
+ "Ò",
127
+ "Ó",
128
+ "Ô",
129
+ "Õ",
130
+ "Ö",
131
+ "Ø",
132
+ "Ù",
133
+ "Ú",
134
+ "Û",
135
+ "Ü",
136
+ "Ý",
137
+ "Þ",
138
+ "ß",
139
+ "à",
140
+ "á",
141
+ "â",
142
+ "ã",
143
+ "ä",
144
+ "å",
145
+ "æ",
146
+ "ç",
147
+ "è",
148
+ "é",
149
+ "ê",
150
+ "ë",
151
+ "ì",
152
+ "í",
153
+ "î",
154
+ "ï",
155
+ "ð",
156
+ "ñ",
157
+ "ò",
158
+ "ó",
159
+ "ô",
160
+ "õ",
161
+ "ö",
162
+ "ø",
163
+ "ù",
164
+ "ú",
165
+ "û",
166
+ "ü",
167
+ "ý",
168
+ "þ",
169
+ "ÿ",
170
+ "Ā",
171
+ "ā",
172
+ "Ă",
173
+ "ă",
174
+ "Ą",
175
+ "ą",
176
+ "Ć",
177
+ "ć",
178
+ "Č",
179
+ "č",
180
+ "Ď",
181
+ "ď",
182
+ "Đ",
183
+ "đ",
184
+ "Ē",
185
+ "ē",
186
+ "Ė",
187
+ "ė",
188
+ "Ę",
189
+ "ę",
190
+ "Ě",
191
+ "ě",
192
+ "Ğ",
193
+ "ğ",
194
+ "Ġ",
195
+ "ġ",
196
+ "Ħ",
197
+ "ħ",
198
+ "Ĩ",
199
+ "ĩ",
200
+ "Ī",
201
+ "ī",
202
+ "İ",
203
+ "ı",
204
+ "Ķ",
205
+ "ķ",
206
+ "Ľ",
207
+ "ľ",
208
+ "Ł",
209
+ "ł",
210
+ "Ń",
211
+ "ń",
212
+ "Ņ",
213
+ "ņ",
214
+ "Ň",
215
+ "ň",
216
+ "Ŋ",
217
+ "ŋ",
218
+ "Ō",
219
+ "ō",
220
+ "Ŏ",
221
+ "ŏ",
222
+ "Ő",
223
+ "ő",
224
+ "Œ",
225
+ "œ",
226
+ "Ř",
227
+ "ř",
228
+ "Ś",
229
+ "ś",
230
+ "Ş",
231
+ "ş",
232
+ "Š",
233
+ "š",
234
+ "Ţ",
235
+ "ţ",
236
+ "Ť",
237
+ "ť",
238
+ "Ũ",
239
+ "ũ",
240
+ "Ū",
241
+ "ū",
242
+ "Ŭ",
243
+ "ŭ",
244
+ "Ů",
245
+ "ů",
246
+ "Ų",
247
+ "ų",
248
+ "Ŵ",
249
+ "ŵ",
250
+ "Ŷ",
251
+ "ŷ",
252
+ "Ÿ",
253
+ "Ź",
254
+ "ź",
255
+ "Ż",
256
+ "ż",
257
+ "Ž",
258
+ "ž",
259
+ "Ɓ",
260
+ "Ɔ",
261
+ "Ɖ",
262
+ "Ɗ",
263
+ "Ə",
264
+ "Ɛ",
265
+ "Ƒ",
266
+ "ƒ",
267
+ "Ɣ",
268
+ "Ɨ",
269
+ "Ɯ",
270
+ "Ɲ",
271
+ "Ɵ",
272
+ "Ơ",
273
+ "ơ",
274
+ "Ʀ",
275
+ "Ʃ",
276
+ "Ʈ",
277
+ "Ư",
278
+ "ư",
279
+ "Ʊ",
280
+ "Ʋ",
281
+ "Ʒ",
282
+ "ǂ",
283
+ "Ǎ",
284
+ "ǎ",
285
+ "Ǐ",
286
+ "ǐ",
287
+ "Ǒ",
288
+ "ǒ",
289
+ "Ǔ",
290
+ "ǔ",
291
+ "Ǫ",
292
+ "ǫ",
293
+ "Ș",
294
+ "ș",
295
+ "Ț",
296
+ "ț",
297
+ "Ʌ",
298
+ "ɐ",
299
+ "ɑ",
300
+ "ɒ",
301
+ "ɓ",
302
+ "ɔ",
303
+ "ɕ",
304
+ "ɖ",
305
+ "ɗ",
306
+ "ə",
307
+ "ɛ",
308
+ "ɟ",
309
+ "ɡ",
310
+ "ɢ",
311
+ "ɣ",
312
+ "ɦ",
313
+ "ɧ",
314
+ "ɨ",
315
+ "ɪ",
316
+ "ɬ",
317
+ "ɯ",
318
+ "ɲ",
319
+ "ɴ",
320
+ "ɵ",
321
+ "ɸ",
322
+ "ɻ",
323
+ "ɾ",
324
+ "ʀ",
325
+ "ʁ",
326
+ "ʂ",
327
+ "ʃ",
328
+ "ʇ",
329
+ "ʈ",
330
+ "ʊ",
331
+ "ʋ",
332
+ "ʌ",
333
+ "ʍ",
334
+ "ʎ",
335
+ "ʒ",
336
+ "ʔ",
337
+ "ʕ",
338
+ "ʘ",
339
+ "ʝ",
340
+ "ʟ",
341
+ "ʰ",
342
+ "ʲ",
343
+ "ʷ",
344
+ "ʻ",
345
+ "ʼ",
346
+ "ʾ",
347
+ "ʿ",
348
+ "ˀ",
349
+ "ˁ",
350
+ "ˈ",
351
+ "ˌ",
352
+ "ː",
353
+ "ˠ",
354
+ "ˤ",
355
+ "Ά",
356
+ "Έ",
357
+ "Ί",
358
+ "Ό",
359
+ "Ύ",
360
+ "Ώ",
361
+ "Α",
362
+ "Α͂",
363
+ "Β",
364
+ "Γ",
365
+ "Δ",
366
+ "Ε",
367
+ "Ζ",
368
+ "Η",
369
+ "Η͂",
370
+ "Θ",
371
+ "Ι",
372
+ "Ι͂",
373
+ "Κ",
374
+ "Λ",
375
+ "Μ",
376
+ "Ν",
377
+ "Ξ",
378
+ "Ο",
379
+ "Π",
380
+ "Ρ",
381
+ "Σ",
382
+ "Τ",
383
+ "Υ",
384
+ "Υ̓",
385
+ "Υ͂",
386
+ "Φ",
387
+ "Χ",
388
+ "Ψ",
389
+ "Ω",
390
+ "Ω͂",
391
+ "Ω͂Ι",
392
+ "ά",
393
+ "έ",
394
+ "ί",
395
+ "α",
396
+ "ᾶ",
397
+ "β",
398
+ "γ",
399
+ "δ",
400
+ "ε",
401
+ "ζ",
402
+ "η",
403
+ "ῆ",
404
+ "θ",
405
+ "ι",
406
+ "ῖ",
407
+ "κ",
408
+ "λ",
409
+ "μ",
410
+ "ν",
411
+ "ξ",
412
+ "ο",
413
+ "π",
414
+ "ρ",
415
+ "ς",
416
+ "σ",
417
+ "τ",
418
+ "υ",
419
+ "ὐ",
420
+ "ῦ",
421
+ "φ",
422
+ "χ",
423
+ "ψ",
424
+ "ω",
425
+ "ῶ",
426
+ "ῶι",
427
+ "ό",
428
+ "ύ",
429
+ "ώ",
430
+ "ϕ",
431
+ "Ё",
432
+ "І",
433
+ "Ј",
434
+ "А",
435
+ "Б",
436
+ "В",
437
+ "Г",
438
+ "Д",
439
+ "Е",
440
+ "Ж",
441
+ "З",
442
+ "И",
443
+ "Й",
444
+ "К",
445
+ "Л",
446
+ "М",
447
+ "Н",
448
+ "О",
449
+ "П",
450
+ "Р",
451
+ "С",
452
+ "Т",
453
+ "У",
454
+ "Х",
455
+ "Ц",
456
+ "Ч",
457
+ "Ш",
458
+ "Ъ",
459
+ "Ы",
460
+ "Ь",
461
+ "Э",
462
+ "Ю",
463
+ "Я",
464
+ "а",
465
+ "б",
466
+ "в",
467
+ "г",
468
+ "д",
469
+ "е",
470
+ "ж",
471
+ "з",
472
+ "и",
473
+ "й",
474
+ "к",
475
+ "л",
476
+ "м",
477
+ "н",
478
+ "о",
479
+ "п",
480
+ "р",
481
+ "с",
482
+ "т",
483
+ "у",
484
+ "х",
485
+ "ц",
486
+ "ч",
487
+ "ш",
488
+ "ъ",
489
+ "ы",
490
+ "ь",
491
+ "э",
492
+ "ю",
493
+ "я",
494
+ "ё",
495
+ "і",
496
+ "ј",
497
+ "ֵ",
498
+ "ֶ",
499
+ "ּ",
500
+ "א",
501
+ "ב",
502
+ "ג",
503
+ "ד",
504
+ "ו",
505
+ "ח",
506
+ "י",
507
+ "ל",
508
+ "ם",
509
+ "מ",
510
+ "נ",
511
+ "ס",
512
+ "ע",
513
+ "צ",
514
+ "ר",
515
+ "ש",
516
+ "ת",
517
+ "ء",
518
+ "أ",
519
+ "إ",
520
+ "ا",
521
+ "ب",
522
+ "ة",
523
+ "ت",
524
+ "ج",
525
+ "ح",
526
+ "خ",
527
+ "د",
528
+ "ر",
529
+ "ز",
530
+ "س",
531
+ "ش",
532
+ "ص",
533
+ "ط",
534
+ "ع",
535
+ "غ",
536
+ "ف",
537
+ "ق",
538
+ "ك",
539
+ "ل",
540
+ "م",
541
+ "ن",
542
+ "ه",
543
+ "و",
544
+ "ي",
545
+ "ی",
546
+ "ं",
547
+ "अ",
548
+ "आ",
549
+ "उ",
550
+ "क",
551
+ "ग",
552
+ "ट",
553
+ "ड",
554
+ "त",
555
+ "द",
556
+ "न",
557
+ "प",
558
+ "ब",
559
+ "भ",
560
+ "म",
561
+ "य",
562
+ "र",
563
+ "ल",
564
+ "श",
565
+ "ष",
566
+ "स",
567
+ "ह",
568
+ "ा",
569
+ "ि",
570
+ "ी",
571
+ "े",
572
+ "ो",
573
+ "ক",
574
+ "ত",
575
+ "ল",
576
+ "া",
577
+ "ি",
578
+ "க",
579
+ "ன",
580
+ "ள",
581
+ "ข",
582
+ "ง",
583
+ "จ",
584
+ "ช",
585
+ "ฐ",
586
+ "ต",
587
+ "ท",
588
+ "น",
589
+ "ป",
590
+ "พ",
591
+ "ร",
592
+ "ว",
593
+ "ะ",
594
+ "ั",
595
+ "า",
596
+ "เ",
597
+ "แ",
598
+ "ᛃ",
599
+ "ᛋ",
600
+ "ᛟ",
601
+ "Ḍ",
602
+ "ḍ",
603
+ "Ḥ",
604
+ "ḥ",
605
+ "Ḷ",
606
+ "ḷ",
607
+ "Ḻ",
608
+ "ḻ",
609
+ "Ṃ",
610
+ "ṃ",
611
+ "Ṅ",
612
+ "ṅ",
613
+ "Ṇ",
614
+ "ṇ",
615
+ "Ṉ",
616
+ "ṉ",
617
+ "Ṛ",
618
+ "ṛ",
619
+ "Ṟ",
620
+ "ṟ",
621
+ "Ṣ",
622
+ "ṣ",
623
+ "Ṭ",
624
+ "ṭ",
625
+ "Ṯ",
626
+ "ṯ",
627
+ "Ạ",
628
+ "ạ",
629
+ "Ả",
630
+ "ả",
631
+ "Ấ",
632
+ "ấ",
633
+ "Ầ",
634
+ "ầ",
635
+ "Ẩ",
636
+ "ẩ",
637
+ "Ẫ",
638
+ "ẫ",
639
+ "Ậ",
640
+ "ậ",
641
+ "Ắ",
642
+ "ắ",
643
+ "Ẵ",
644
+ "ẵ",
645
+ "Ặ",
646
+ "ặ",
647
+ "Ẹ",
648
+ "ẹ",
649
+ "Ế",
650
+ "ế",
651
+ "Ể",
652
+ "ể",
653
+ "Ễ",
654
+ "ễ",
655
+ "Ệ",
656
+ "ệ",
657
+ "Ị",
658
+ "ị",
659
+ "Ọ",
660
+ "ọ",
661
+ "Ỏ",
662
+ "ỏ",
663
+ "Ố",
664
+ "ố",
665
+ "Ồ",
666
+ "ồ",
667
+ "Ổ",
668
+ "ổ",
669
+ "Ỗ",
670
+ "ỗ",
671
+ "Ộ",
672
+ "ộ",
673
+ "Ớ",
674
+ "ớ",
675
+ "Ờ",
676
+ "ờ",
677
+ "Ở",
678
+ "ở",
679
+ "Ợ",
680
+ "ợ",
681
+ "Ụ",
682
+ "ụ",
683
+ "Ủ",
684
+ "ủ",
685
+ "Ứ",
686
+ "ứ",
687
+ "Ừ",
688
+ "ừ",
689
+ "Ử",
690
+ "ử",
691
+ "Ữ",
692
+ "ữ",
693
+ "Ự",
694
+ "ự",
695
+ "Ỳ",
696
+ "ỳ",
697
+ "Ỵ",
698
+ "ỵ",
699
+ "Ỹ",
700
+ "ỹ",
701
+ "ἀ",
702
+ "ἄ",
703
+ "Ἀ",
704
+ "Ἄ",
705
+ "ἐ",
706
+ "ἕ",
707
+ "Ἐ",
708
+ "Ἕ",
709
+ "ἠ",
710
+ "ἡ",
711
+ "Ἠ",
712
+ "Ἡ",
713
+ "ἰ",
714
+ "ἱ",
715
+ "Ἰ",
716
+ "Ἱ",
717
+ "ὁ",
718
+ "ὄ",
719
+ "Ὁ",
720
+ "Ὄ",
721
+ "ὐ",
722
+ "ὑ",
723
+ "Ὑ",
724
+ "ὡ",
725
+ "Ὡ",
726
+ "ὰ",
727
+ "ὲ",
728
+ "ὴ",
729
+ "ὶ",
730
+ "ὸ",
731
+ "ὺ",
732
+ "ὼ",
733
+ "ᾶ",
734
+ "Ὰ",
735
+ "ῆ",
736
+ "Ὲ",
737
+ "Ὴ",
738
+ "ῖ",
739
+ "Ὶ",
740
+ "ῦ",
741
+ "Ὺ",
742
+ "ῶ",
743
+ "ῷ",
744
+ "Ὸ",
745
+ "Ὼ",
746
+ "₁",
747
+ "₂",
748
+ "₃",
749
+ "ℓ",
750
+ "①",
751
+ "②",
752
+ "④",
753
+ "Ɑ",
754
+ "Ɐ",
755
+ "Ɒ",
756
+ "い",
757
+ "ぅ",
758
+ "う",
759
+ "お",
760
+ "か",
761
+ "き",
762
+ "く",
763
+ "ぐ",
764
+ "こ",
765
+ "し",
766
+ "す",
767
+ "せ",
768
+ "た",
769
+ "つ",
770
+ "ど",
771
+ "の",
772
+ "ば",
773
+ "ぽ",
774
+ "よ",
775
+ "ら",
776
+ "ん",
777
+ "ァ",
778
+ "ア",
779
+ "ィ",
780
+ "イ",
781
+ "ウ",
782
+ "ェ",
783
+ "エ",
784
+ "ォ",
785
+ "オ",
786
+ "カ",
787
+ "ガ",
788
+ "ク",
789
+ "グ",
790
+ "コ",
791
+ "ゴ",
792
+ "サ",
793
+ "ザ",
794
+ "シ",
795
+ "ジ",
796
+ "ス",
797
+ "ズ",
798
+ "セ",
799
+ "ゼ",
800
+ "ソ",
801
+ "タ",
802
+ "チ",
803
+ "ッ",
804
+ "ツ",
805
+ "テ",
806
+ "デ",
807
+ "ト",
808
+ "ド",
809
+ "ナ",
810
+ "ニ",
811
+ "ノ",
812
+ "ハ",
813
+ "バ",
814
+ "パ",
815
+ "ヒ",
816
+ "ビ",
817
+ "フ",
818
+ "ブ",
819
+ "プ",
820
+ "ベ",
821
+ "ペ",
822
+ "ボ",
823
+ "マ",
824
+ "ミ",
825
+ "メ",
826
+ "ャ",
827
+ "ヤ",
828
+ "ュ",
829
+ "ユ",
830
+ "ラ",
831
+ "リ",
832
+ "ル",
833
+ "レ",
834
+ "ロ",
835
+ "ワ",
836
+ "ン",
837
+ "ヴ",
838
+ "ー",
839
+ "Ɦ",
840
+ "Ɡ",
841
+ "Ɬ",
842
+ "Ɪ",
843
+ "Ʇ",
844
+ "Ʝ",
845
+ "Ʂ",
846
+ "거",
847
+ "마",
848
+ "막",
849
+ "말",
850
+ "사",
851
+ "인",
852
+ "전",
853
+ "지",
854
+ "짓",
855
+ "투",
856
+ "fi"
857
+ ]
v2_english/detector.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:064950b565833dfa15eaa6406a7ec9a8adc2ae159eaef9e0856f657dc0e92d2b
3
+ size 181974624
v2_english/model_config.json ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "num_tokens": 858,
3
+ "max_width": 32,
4
+ "sequence_length": 32,
5
+ "scope": 2048,
6
+ "coordinate_mode": "RBOX",
7
+ "backbone": "regnet_x_8gf",
8
+ "charset_size": 855,
9
+ "recognizer_variant": "prenorm",
10
+ "has_pre_norm": false,
11
+ "has_tx_norm": true,
12
+ "norm_first": true,
13
+ "depth": 128,
14
+ "num_layers": 3,
15
+ "nhead": 8,
16
+ "dim_feedforward": 1024,
17
+ "feature_depth": 256
18
+ }
v2_english/recognizer.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:366af771f46bfe31cfd4876e28eeab04b4ee6b11b9c9f9b6de49f7b58799f728
3
+ size 24550133
v2_english/relational.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:763c18416f8b5285187d90e8d71f9c11b394f7a0dfd67f3e4b529a16bf583816
3
+ size 9044661
{checkpoints → v2_multilingual}/charset.txt RENAMED
File without changes
{checkpoints → v2_multilingual}/detector.pth RENAMED
File without changes
{checkpoints → v2_multilingual}/model_config.json RENAMED
File without changes
{checkpoints → v2_multilingual}/recognizer.pth RENAMED
File without changes
{checkpoints → v2_multilingual}/relational.pth RENAMED
File without changes