| --- |
| license: other |
| license_name: nvidia-open-model-license |
| license_link: https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/ |
| base_model: nvidia/nemotron-ocr-v2 |
| library_name: coreml |
| tags: |
| - coreml |
| - ocr |
| - text-detection |
| - apple-silicon |
| - apple-neural-engine |
| - swift |
| --- |
| |
| # OCR CoreML Detector 224 |
|
|
| `Detector224.mlpackage` is a batch-1 CoreML conversion of the detector stage |
| from [NVIDIA Nemotron OCR v2](https://huggingface.co/nvidia/nemotron-ocr-v2). |
| It is intended for Apple-device OCR pipelines that need a packaged text |
| detection model and will implement their own post-processing, recognition, and |
| layout stages. |
|
|
| SwiftPM package: |
| [github.com/mweinbach/OCRCoreMLDetector](https://github.com/mweinbach/OCRCoreMLDetector) |
|
|
| ## What This Is |
|
|
| - Source model: `nvidia/nemotron-ocr-v2`, `v2_english` detector |
| - CoreML artifact: `Detector224.mlpackage` |
| - Conversion config: `experiments/224_detector_ane_decomposed_int8_768` |
| - Input size: `768 x 768` |
| - Batch size: `1` |
| - Weight quantization: int8 per-channel linear symmetric |
| - Compute precision: fp16 |
| - Minimum deployment target used during conversion: iOS 18 |
|
|
| This is not a complete OCR system. The package returns detector tensors only. |
| Downstream code still needs thresholding, rotated-box decoding, |
| non-maximum suppression, crop/rectify, recognition, and reading-order/layout |
| logic. |
|
|
| ## Files |
|
|
| | file | purpose | |
| |---|---| |
| | `Detector224.mlpackage/` | CoreML model package | |
| | `conversion_config.yaml` | conversion/benchmark config used to create the artifact | |
| | `bench.md` | local CoreML latency results | |
| | `parity.json` | PyTorch-vs-CoreML parity summary | |
| | `checksums.sha256` | SHA-256 checksums for the CoreML package files | |
| | `LICENSE` | NVIDIA Open Model License plus Apache 2.0 source license text from upstream | |
| | `NOTICE` | redistribution attribution notice | |
|
|
| ## Input Contract |
|
|
| The model expects one CoreML input named `image`: |
|
|
| - shape: `Float32[1, 3, 768, 768]` |
| - layout: RGB planar |
| - normalization: pixel values in `[0, 1]` |
|
|
| The SwiftPM wrapper linked above includes a helper that converts a `CGImage` to |
| this tensor shape. |
|
|
| ## Output Contract |
|
|
| The model returns: |
|
|
| | output | shape | meaning | |
| |---|---:|---| |
| | `prob` | `Float32[1, 192, 192]` | text probability map | |
| | `rboxes` | `Float32[1, 192, 192, 5]` | rotated-box geometry | |
| | `features` | `Float32[1, 128, 192, 192]` | detector feature map | |
|
|
| ## Local Performance |
|
|
| Measured on the bundled sample image after warmup: |
|
|
| | compute units | median prediction latency | |
| |---|---:| |
| | `ALL` | 13.53 ms | |
| | `CPU_AND_GPU` | 13.65 ms | |
| | `CPU_AND_NE` | 54.51 ms | |
| | `CPU_ONLY` | 298.28 ms | |
|
|
| For lowest single-image latency in the current test environment, use GPU or |
| CoreML `ALL`. CPU+ANE is available but slower for this detector. |
|
|
| ## Use From Swift |
|
|
| Add the Swift package: |
|
|
| ```swift |
| .package(url: "https://github.com/mweinbach/OCRCoreMLDetector.git", from: "0.1.0") |
| ``` |
|
|
| Then: |
|
|
| ```swift |
| import CoreML |
| import OCRCoreMLDetector |
| |
| let detector = try OCRDetector(computeUnits: .cpuAndGPU) |
| let prediction = try detector.prediction(for: cgImage) |
| |
| let prob = prediction.output.prob |
| let rboxes = prediction.output.rboxes |
| let features = prediction.output.features |
| ``` |
|
|
| ## License |
|
|
| The converted model weights inherit the |
| [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/). |
| The upstream source code and helper scripts are Apache 2.0. See `LICENSE` and |
| `NOTICE` for redistribution terms and attribution. |
|
|