# Quickstart ## Prerequisites - **Python 3.12** (the package requires `>=3.12,<3.13`) - **CUDA toolkit** with `nvcc` on `PATH` (the package compiles a CUDA C++ extension at install time) - **A CUDA GPU** (or set `TORCH_CUDA_ARCH_LIST` to cross-compile, e.g. `TORCH_CUDA_ARCH_LIST="8.0 9.0"`) The CUDA toolkit version must share the same **major version** as the CUDA bindings in your PyTorch install (e.g. toolkit 12.4 with `torch+cu128` is fine; toolkit 12.4 with `torch+cu130` will fail). On Slurm clusters, run the install on a GPU node or load the CUDA module first: ```bash module load cuda12.4/toolkit/12.4.1 # example; adjust for your cluster export CUDA_HOME=/usr/local/cuda # or wherever the toolkit lives ``` ## Installation Install PyTorch **first** with bindings matching your CUDA toolkit, then install this package with `--no-build-isolation` so it builds the C++ extension against your existing PyTorch: ```bash # 1. Install PyTorch (adjust the index URL for your CUDA version) pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128 # 2. Install nemotron-ocr cd nemotron-ocr pip install --no-build-isolation -v . ``` > **Why `--no-build-isolation`?** Without it, pip creates a temporary build > environment and installs the latest PyTorch from PyPI. That PyTorch's CUDA > version may not match your system's `nvcc`, causing the C++ extension build > to fail with a CUDA version mismatch error. Verify the installation (the C++ extension must load without errors): ```bash python -c "from nemotron_ocr.inference.pipeline_v2 import NemotronOCRV2; print('OK')" ``` ## Usage `NemotronOCRV2` is the recommended entry point for OCR inference: ```python from nemotron_ocr.inference.pipeline_v2 import NemotronOCRV2 ocr = NemotronOCRV2() predictions = ocr("ocr-example-input-1.png") for pred in predictions: print(f" - Text: '{pred['text']}', Confidence: {pred['confidence']:.2f}") ``` The level of detection merging can be adjusted with `merge_level`: ```python ocr(image_path, merge_level="word") # individual words ocr(image_path, merge_level="sentence") # merged into sentences ocr(image_path, merge_level="paragraph") # merged into paragraphs (default) ``` ### Inference modes ```python # Detector only — bounding boxes, no text (fastest, lowest memory) ocr_det = NemotronOCRV2(detector_only=True) # Skip relational — per-word text, no reading-order grouping ocr_fast = NemotronOCRV2(skip_relational=True) # Profiling — per-phase CUDA-synced timing in logs ocr_profile = NemotronOCRV2(verbose_post=True) ``` ### Example script ```bash python example.py ocr-example-input-1.png python example.py ocr-example-input-1.png --merge-level word python example.py ocr-example-input-1.png --detector-only python example.py ocr-example-input-1.png --skip-relational ```