mweinbach1
/

ocr-coreml-detector-224

apple-neural-engine

Model card Files Files and versions

ocr-coreml-detector-224 / README.md

mweinbach1's picture

Upload CoreML detector package

b79acd1 verified 30 days ago

|

history blame contribute delete

3.52 kB

	---
	license: other
	license_name: nvidia-open-model-license
	license_link: https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
	base_model: nvidia/nemotron-ocr-v2
	library_name: coreml
	tags:
	- coreml
	- ocr
	- text-detection
	- apple-silicon
	- apple-neural-engine
	- swift
	---

	# OCR CoreML Detector 224

	`Detector224.mlpackage` is a batch-1 CoreML conversion of the detector stage
	from [NVIDIA Nemotron OCR v2](https://huggingface.co/nvidia/nemotron-ocr-v2).
	It is intended for Apple-device OCR pipelines that need a packaged text
	detection model and will implement their own post-processing, recognition, and
	layout stages.

	SwiftPM package:
	[github.com/mweinbach/OCRCoreMLDetector](https://github.com/mweinbach/OCRCoreMLDetector)

	## What This Is

	- Source model: `nvidia/nemotron-ocr-v2`, `v2_english` detector
	- CoreML artifact: `Detector224.mlpackage`
	- Conversion config: `experiments/224_detector_ane_decomposed_int8_768`
	- Input size: `768 x 768`
	- Batch size: `1`
	- Weight quantization: int8 per-channel linear symmetric
	- Compute precision: fp16
	- Minimum deployment target used during conversion: iOS 18

	This is not a complete OCR system. The package returns detector tensors only.
	Downstream code still needs thresholding, rotated-box decoding,
	non-maximum suppression, crop/rectify, recognition, and reading-order/layout
	logic.

	## Files

	\| file \| purpose \|
	\|---\|---\|
	\| `Detector224.mlpackage/` \| CoreML model package \|
	\| `conversion_config.yaml` \| conversion/benchmark config used to create the artifact \|
	\| `bench.md` \| local CoreML latency results \|
	\| `parity.json` \| PyTorch-vs-CoreML parity summary \|
	\| `checksums.sha256` \| SHA-256 checksums for the CoreML package files \|
	\| `LICENSE` \| NVIDIA Open Model License plus Apache 2.0 source license text from upstream \|
	\| `NOTICE` \| redistribution attribution notice \|

	## Input Contract

	The model expects one CoreML input named `image`:

	- shape: `Float32[1, 3, 768, 768]`
	- layout: RGB planar
	- normalization: pixel values in `[0, 1]`

	The SwiftPM wrapper linked above includes a helper that converts a `CGImage` to
	this tensor shape.

	## Output Contract

	The model returns:

	\| output \| shape \| meaning \|
	\|---\|---:\|---\|
	\| `prob` \| `Float32[1, 192, 192]` \| text probability map \|
	\| `rboxes` \| `Float32[1, 192, 192, 5]` \| rotated-box geometry \|
	\| `features` \| `Float32[1, 128, 192, 192]` \| detector feature map \|

	## Local Performance

	Measured on the bundled sample image after warmup:

	\| compute units \| median prediction latency \|
	\|---\|---:\|
	\| `ALL` \| 13.53 ms \|
	\| `CPU_AND_GPU` \| 13.65 ms \|
	\| `CPU_AND_NE` \| 54.51 ms \|
	\| `CPU_ONLY` \| 298.28 ms \|

	For lowest single-image latency in the current test environment, use GPU or
	CoreML `ALL`. CPU+ANE is available but slower for this detector.

	## Use From Swift

	Add the Swift package:

	```swift
	.package(url: "https://github.com/mweinbach/OCRCoreMLDetector.git", from: "0.1.0")
	```

	Then:

	```swift
	import CoreML
	import OCRCoreMLDetector

	let detector = try OCRDetector(computeUnits: .cpuAndGPU)
	let prediction = try detector.prediction(for: cgImage)

	let prob = prediction.output.prob
	let rboxes = prediction.output.rboxes
	let features = prediction.output.features
	```

	## License

	The converted model weights inherit the
	[NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).
	The upstream source code and helper scripts are Apache 2.0. See `LICENSE` and
	`NOTICE` for redistribution terms and attribution.