| --- |
| license: other |
| license_name: nvidia-open-model-license |
| license_link: https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/ |
| base_model: nvidia/nemotron-ocr-v2 |
| library_name: coreml |
| tags: |
| - coreml |
| - ocr |
| - text-detection |
| - text-recognition |
| - apple-silicon |
| - apple-neural-engine |
| - swift |
| --- |
| |
| # Nemotron OCR v2 CoreML |
|
|
| CoreML conversion of the English neural stages from |
| [NVIDIA Nemotron OCR v2](https://huggingface.co/nvidia/nemotron-ocr-v2). |
|
|
| SwiftPM package: |
| [github.com/mweinbach/OCRCoreML](https://github.com/mweinbach/OCRCoreML) |
|
|
| ## Included Models |
|
|
| | stage | file | input | outputs | |
| |---|---|---|---| |
| | detector | `DetectorGPUInt8_768.mlpackage` | `image: Float32[1, 3, 768, 768]` | `prob`, `rboxes`, `features` | |
| | recognizer | `RecognizerFeaturesInt8.mlpackage` | `regions: Float32[128, 128, 8, 32]` | `logits`, `features` | |
| | relational | `RelationalInt8.mlpackage` | rectified regions, original quads, recognizer features, valid count | `words`, `lines`, `line_log_var` | |
|
|
| The recognizer emits transformer `features`; those are required by the |
| relational model, so this bundle covers the full neural OCR pipeline rather |
| than detector-only inference. |
|
|
| ## Pipeline Boundary |
|
|
| The original Python package uses CUDA/C++ helpers for the non-neural stages: |
| rotated-box NMS, `rboxes` to quads, quad rectification, feature-map grid |
| sampling, sequence decoding, relation-graph decoding, and reading-order |
| formatting. Those operations are not CoreML models. Apple apps integrating this |
| bundle must port or replace those post-processing steps. |
|
|
| The linked SwiftPM package includes wrappers for all three CoreML models and a |
| greedy recognizer decoder. It exposes raw tensors rather than claiming complete |
| image-to-text OCR until the geometric and graph post-processing is ported. |
|
|
| ## Files |
|
|
| | file | purpose | |
| |---|---| |
| | `DetectorGPUInt8_768.mlpackage/` | detector CoreML package | |
| | `RecognizerFeaturesInt8.mlpackage/` | recognizer CoreML package with logits and features | |
| | `RelationalInt8.mlpackage/` | relational CoreML package | |
| | `charset.txt` | English checkpoint charset | |
| | `model_config.json` | English checkpoint config | |
| | `configs/` | conversion configs used for the three packages | |
| | `benchmarks/` | local CoreML benchmark results | |
| | `parity/` | PyTorch-vs-CoreML parity reports | |
| | `checksums.sha256` | SHA-256 checksums for package files | |
| | `LICENSE`, `NOTICE` | license terms and redistribution notice | |
|
|
| ## Performance |
|
|
| Local median latencies after warmup: |
|
|
| | stage | GPU/ALL median | CPU+NE median | CPU median | |
| |---|---:|---:|---:| |
| | detector | 10.65 ms | 50.46 ms | 157.71 ms | |
| | recognizer + features | 4.53 ms | 11.04 ms | 47.58 ms | |
| | relational | 1.72 ms | 6.38 ms | 34.53 ms | |
|
|
| GPU/CoreML `ALL` is the best single-shot latency path on the test machine. |
| CPU+ANE is useful when GPU time needs to be reserved for rendering or other |
| workloads. |
|
|
| ## Swift Usage |
|
|
| ```swift |
| import OCRCoreML |
| |
| let pipeline = try OCRPipeline(computeUnits: .cpuAndGPU) |
| let detectorPrediction = try pipeline.detect(image: cgImage) |
| |
| let recognizerPrediction = try pipeline.recognize(regions: regions) |
| let decoded = try pipeline.recognizer.decode( |
| logits: recognizerPrediction.output.logits, |
| count: detectedRegionCount |
| ) |
| |
| let relationalPrediction = try pipeline.relate( |
| rectifiedQuads: relationalRegionFeatures, |
| originalQuads: originalQuads, |
| recognizerFeatures: recognizerPrediction.output.features, |
| numValid: detectedRegionCount |
| ) |
| ``` |
|
|
| See the SwiftPM docs for exact app integration notes: |
| <https://github.com/mweinbach/OCRCoreML> |
|
|
| ## License |
|
|
| The converted model weights inherit the |
| [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/). |
| The upstream source code and helper scripts are Apache 2.0. See `LICENSE` and |
| `NOTICE`. |
|
|