File size: 3,522 Bytes

b79acd1

---
license: other
license_name: nvidia-open-model-license
license_link: https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
base_model: nvidia/nemotron-ocr-v2
library_name: coreml
tags:
- coreml
- ocr
- text-detection
- apple-silicon
- apple-neural-engine
- swift
---

# OCR CoreML Detector 224

`Detector224.mlpackage` is a batch-1 CoreML conversion of the detector stage
from [NVIDIA Nemotron OCR v2](https://huggingface.co/nvidia/nemotron-ocr-v2).
It is intended for Apple-device OCR pipelines that need a packaged text
detection model and will implement their own post-processing, recognition, and
layout stages.

SwiftPM package:
[github.com/mweinbach/OCRCoreMLDetector](https://github.com/mweinbach/OCRCoreMLDetector)

## What This Is

- Source model: `nvidia/nemotron-ocr-v2`, `v2_english` detector
- CoreML artifact: `Detector224.mlpackage`
- Conversion config: `experiments/224_detector_ane_decomposed_int8_768`
- Input size: `768 x 768`
- Batch size: `1`
- Weight quantization: int8 per-channel linear symmetric
- Compute precision: fp16
- Minimum deployment target used during conversion: iOS 18

This is not a complete OCR system. The package returns detector tensors only.
Downstream code still needs thresholding, rotated-box decoding,
non-maximum suppression, crop/rectify, recognition, and reading-order/layout
logic.

## Files

| file | purpose |
|---|---|
| `Detector224.mlpackage/` | CoreML model package |
| `conversion_config.yaml` | conversion/benchmark config used to create the artifact |
| `bench.md` | local CoreML latency results |
| `parity.json` | PyTorch-vs-CoreML parity summary |
| `checksums.sha256` | SHA-256 checksums for the CoreML package files |
| `LICENSE` | NVIDIA Open Model License plus Apache 2.0 source license text from upstream |
| `NOTICE` | redistribution attribution notice |

## Input Contract

The model expects one CoreML input named `image`:

- shape: `Float32[1, 3, 768, 768]`
- layout: RGB planar
- normalization: pixel values in `[0, 1]`

The SwiftPM wrapper linked above includes a helper that converts a `CGImage` to
this tensor shape.

## Output Contract

The model returns:

| output | shape | meaning |
|---|---:|---|
| `prob` | `Float32[1, 192, 192]` | text probability map |
| `rboxes` | `Float32[1, 192, 192, 5]` | rotated-box geometry |
| `features` | `Float32[1, 128, 192, 192]` | detector feature map |

## Local Performance

Measured on the bundled sample image after warmup:

| compute units | median prediction latency |
|---|---:|
| `ALL` | 13.53 ms |
| `CPU_AND_GPU` | 13.65 ms |
| `CPU_AND_NE` | 54.51 ms |
| `CPU_ONLY` | 298.28 ms |

For lowest single-image latency in the current test environment, use GPU or
CoreML `ALL`. CPU+ANE is available but slower for this detector.

## Use From Swift

Add the Swift package:

```swift
.package(url: "https://github.com/mweinbach/OCRCoreMLDetector.git", from: "0.1.0")
```

Then:

```swift
import CoreML
import OCRCoreMLDetector

let detector = try OCRDetector(computeUnits: .cpuAndGPU)
let prediction = try detector.prediction(for: cgImage)

let prob = prediction.output.prob
let rboxes = prediction.output.rboxes
let features = prediction.output.features
```

## License

The converted model weights inherit the
[NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).
The upstream source code and helper scripts are Apache 2.0. See `LICENSE` and
`NOTICE` for redistribution terms and attribution.