--- license: other license_name: nvidia-open-model-license license_link: https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/ base_model: nvidia/nemotron-ocr-v2 library_name: coreml tags: - coreml - ocr - text-detection - text-recognition - apple-silicon - apple-neural-engine - swift --- # Nemotron OCR v2 CoreML CoreML conversion of the English neural stages from [NVIDIA Nemotron OCR v2](https://huggingface.co/nvidia/nemotron-ocr-v2). SwiftPM package: [github.com/mweinbach/OCRCoreML](https://github.com/mweinbach/OCRCoreML) ## Included Models | stage | file | input | outputs | |---|---|---|---| | detector | `DetectorGPUInt8_768.mlpackage` | `image: Float32[1, 3, 768, 768]` | `prob`, `rboxes`, `features` | | recognizer | `RecognizerFeaturesInt8.mlpackage` | `regions: Float32[128, 128, 8, 32]` | `logits`, `features` | | relational | `RelationalInt8.mlpackage` | rectified regions, original quads, recognizer features, valid count | `words`, `lines`, `line_log_var` | The recognizer emits transformer `features`; those are required by the relational model, so this bundle covers the full neural OCR pipeline rather than detector-only inference. ## Pipeline Boundary The original Python package uses CUDA/C++ helpers for the non-neural stages: rotated-box NMS, `rboxes` to quads, quad rectification, feature-map grid sampling, sequence decoding, relation-graph decoding, and reading-order formatting. Those operations are not CoreML models. Apple apps integrating this bundle must port or replace those post-processing steps. The linked SwiftPM package includes wrappers for all three CoreML models and a greedy recognizer decoder. It exposes raw tensors rather than claiming complete image-to-text OCR until the geometric and graph post-processing is ported. ## Files | file | purpose | |---|---| | `DetectorGPUInt8_768.mlpackage/` | detector CoreML package | | `RecognizerFeaturesInt8.mlpackage/` | recognizer CoreML package with logits and features | | `RelationalInt8.mlpackage/` | relational CoreML package | | `charset.txt` | English checkpoint charset | | `model_config.json` | English checkpoint config | | `configs/` | conversion configs used for the three packages | | `benchmarks/` | local CoreML benchmark results | | `parity/` | PyTorch-vs-CoreML parity reports | | `checksums.sha256` | SHA-256 checksums for package files | | `LICENSE`, `NOTICE` | license terms and redistribution notice | ## Performance Local median latencies after warmup: | stage | GPU/ALL median | CPU+NE median | CPU median | |---|---:|---:|---:| | detector | 10.65 ms | 50.46 ms | 157.71 ms | | recognizer + features | 4.53 ms | 11.04 ms | 47.58 ms | | relational | 1.72 ms | 6.38 ms | 34.53 ms | GPU/CoreML `ALL` is the best single-shot latency path on the test machine. CPU+ANE is useful when GPU time needs to be reserved for rendering or other workloads. ## Swift Usage ```swift import OCRCoreML let pipeline = try OCRPipeline(computeUnits: .cpuAndGPU) let detectorPrediction = try pipeline.detect(image: cgImage) let recognizerPrediction = try pipeline.recognize(regions: regions) let decoded = try pipeline.recognizer.decode( logits: recognizerPrediction.output.logits, count: detectedRegionCount ) let relationalPrediction = try pipeline.relate( rectifiedQuads: relationalRegionFeatures, originalQuads: originalQuads, recognizerFeatures: recognizerPrediction.output.features, numValid: detectedRegionCount ) ``` See the SwiftPM docs for exact app integration notes: ## License The converted model weights inherit the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/). The upstream source code and helper scripts are Apache 2.0. See `LICENSE` and `NOTICE`.