File size: 3,522 Bytes
b79acd1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 | ---
license: other
license_name: nvidia-open-model-license
license_link: https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
base_model: nvidia/nemotron-ocr-v2
library_name: coreml
tags:
- coreml
- ocr
- text-detection
- apple-silicon
- apple-neural-engine
- swift
---
# OCR CoreML Detector 224
`Detector224.mlpackage` is a batch-1 CoreML conversion of the detector stage
from [NVIDIA Nemotron OCR v2](https://huggingface.co/nvidia/nemotron-ocr-v2).
It is intended for Apple-device OCR pipelines that need a packaged text
detection model and will implement their own post-processing, recognition, and
layout stages.
SwiftPM package:
[github.com/mweinbach/OCRCoreMLDetector](https://github.com/mweinbach/OCRCoreMLDetector)
## What This Is
- Source model: `nvidia/nemotron-ocr-v2`, `v2_english` detector
- CoreML artifact: `Detector224.mlpackage`
- Conversion config: `experiments/224_detector_ane_decomposed_int8_768`
- Input size: `768 x 768`
- Batch size: `1`
- Weight quantization: int8 per-channel linear symmetric
- Compute precision: fp16
- Minimum deployment target used during conversion: iOS 18
This is not a complete OCR system. The package returns detector tensors only.
Downstream code still needs thresholding, rotated-box decoding,
non-maximum suppression, crop/rectify, recognition, and reading-order/layout
logic.
## Files
| file | purpose |
|---|---|
| `Detector224.mlpackage/` | CoreML model package |
| `conversion_config.yaml` | conversion/benchmark config used to create the artifact |
| `bench.md` | local CoreML latency results |
| `parity.json` | PyTorch-vs-CoreML parity summary |
| `checksums.sha256` | SHA-256 checksums for the CoreML package files |
| `LICENSE` | NVIDIA Open Model License plus Apache 2.0 source license text from upstream |
| `NOTICE` | redistribution attribution notice |
## Input Contract
The model expects one CoreML input named `image`:
- shape: `Float32[1, 3, 768, 768]`
- layout: RGB planar
- normalization: pixel values in `[0, 1]`
The SwiftPM wrapper linked above includes a helper that converts a `CGImage` to
this tensor shape.
## Output Contract
The model returns:
| output | shape | meaning |
|---|---:|---|
| `prob` | `Float32[1, 192, 192]` | text probability map |
| `rboxes` | `Float32[1, 192, 192, 5]` | rotated-box geometry |
| `features` | `Float32[1, 128, 192, 192]` | detector feature map |
## Local Performance
Measured on the bundled sample image after warmup:
| compute units | median prediction latency |
|---|---:|
| `ALL` | 13.53 ms |
| `CPU_AND_GPU` | 13.65 ms |
| `CPU_AND_NE` | 54.51 ms |
| `CPU_ONLY` | 298.28 ms |
For lowest single-image latency in the current test environment, use GPU or
CoreML `ALL`. CPU+ANE is available but slower for this detector.
## Use From Swift
Add the Swift package:
```swift
.package(url: "https://github.com/mweinbach/OCRCoreMLDetector.git", from: "0.1.0")
```
Then:
```swift
import CoreML
import OCRCoreMLDetector
let detector = try OCRDetector(computeUnits: .cpuAndGPU)
let prediction = try detector.prediction(for: cgImage)
let prob = prediction.output.prob
let rboxes = prediction.output.rboxes
let features = prediction.output.features
```
## License
The converted model weights inherit the
[NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).
The upstream source code and helper scripts are Apache 2.0. See `LICENSE` and
`NOTICE` for redistribution terms and attribution.
|