--- license: other license_name: nvidia-open-model-license license_link: https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/ base_model: nvidia/nemotron-ocr-v2 library_name: coreml tags: - coreml - ocr - text-detection - apple-silicon - apple-neural-engine - swift --- # OCR CoreML Detector 224 `Detector224.mlpackage` is a batch-1 CoreML conversion of the detector stage from [NVIDIA Nemotron OCR v2](https://huggingface.co/nvidia/nemotron-ocr-v2). It is intended for Apple-device OCR pipelines that need a packaged text detection model and will implement their own post-processing, recognition, and layout stages. SwiftPM package: [github.com/mweinbach/OCRCoreMLDetector](https://github.com/mweinbach/OCRCoreMLDetector) ## What This Is - Source model: `nvidia/nemotron-ocr-v2`, `v2_english` detector - CoreML artifact: `Detector224.mlpackage` - Conversion config: `experiments/224_detector_ane_decomposed_int8_768` - Input size: `768 x 768` - Batch size: `1` - Weight quantization: int8 per-channel linear symmetric - Compute precision: fp16 - Minimum deployment target used during conversion: iOS 18 This is not a complete OCR system. The package returns detector tensors only. Downstream code still needs thresholding, rotated-box decoding, non-maximum suppression, crop/rectify, recognition, and reading-order/layout logic. ## Files | file | purpose | |---|---| | `Detector224.mlpackage/` | CoreML model package | | `conversion_config.yaml` | conversion/benchmark config used to create the artifact | | `bench.md` | local CoreML latency results | | `parity.json` | PyTorch-vs-CoreML parity summary | | `checksums.sha256` | SHA-256 checksums for the CoreML package files | | `LICENSE` | NVIDIA Open Model License plus Apache 2.0 source license text from upstream | | `NOTICE` | redistribution attribution notice | ## Input Contract The model expects one CoreML input named `image`: - shape: `Float32[1, 3, 768, 768]` - layout: RGB planar - normalization: pixel values in `[0, 1]` The SwiftPM wrapper linked above includes a helper that converts a `CGImage` to this tensor shape. ## Output Contract The model returns: | output | shape | meaning | |---|---:|---| | `prob` | `Float32[1, 192, 192]` | text probability map | | `rboxes` | `Float32[1, 192, 192, 5]` | rotated-box geometry | | `features` | `Float32[1, 128, 192, 192]` | detector feature map | ## Local Performance Measured on the bundled sample image after warmup: | compute units | median prediction latency | |---|---:| | `ALL` | 13.53 ms | | `CPU_AND_GPU` | 13.65 ms | | `CPU_AND_NE` | 54.51 ms | | `CPU_ONLY` | 298.28 ms | For lowest single-image latency in the current test environment, use GPU or CoreML `ALL`. CPU+ANE is available but slower for this detector. ## Use From Swift Add the Swift package: ```swift .package(url: "https://github.com/mweinbach/OCRCoreMLDetector.git", from: "0.1.0") ``` Then: ```swift import CoreML import OCRCoreMLDetector let detector = try OCRDetector(computeUnits: .cpuAndGPU) let prediction = try detector.prediction(for: cgImage) let prob = prediction.output.prob let rboxes = prediction.output.rboxes let features = prediction.output.features ``` ## License The converted model weights inherit the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/). The upstream source code and helper scripts are Apache 2.0. See `LICENSE` and `NOTICE` for redistribution terms and attribution.