# CONTEXT.md — Technical Reference for autolabel

> Keep this file up to date as the project evolves. Read this first when
> resuming work after a break.

---

## What this project does

Uses **OWLv2** (open-vocabulary object detection) and **SAM2** (segment
anything) to auto-label images via text prompts, then exports a COCO dataset
for fine-tuning a detection or segmentation model.

**Current phase:** labeling — two modes available:
- **Detection** — OWLv2 only; produces bounding boxes.
- **Segmentation** — OWLv2 → boxes → SAM2 → pixel masks + COCO polygons.

**Future phase:** fine-tune OWLv2 on the exported COCO dataset using
`scripts/finetune_owlv2.py` (code is ready, not yet in active use).

---

## Architecture

### Primary interface — `app.py` (Gradio web UI)

Two-tab UI, all artifacts written to a session temp dir (nothing in the project):

| Tab | What it does |
|-----|-------------|
| 🧪 Test | Single image → instant annotated preview. Dial in prompts and threshold before a batch run. |
| 📂 Batch | Multiple images → annotated gallery + downloadable ZIP (resized images + `coco_export.json`). |

### CLI scripts (`scripts/`)

Independent entry points for headless / automation use:

| Script | Purpose |
|--------|---------|
| `run_detection.py` | Batch detect → `data/detections/` |
| `export_coco.py` | Build COCO JSON from `data/labeled/` |
| `finetune_owlv2.py` | Fine-tune OWLv2 (future) |

### `autolabel/` package

| Module | Responsibility |
|--------|---------------|
| `config.py` | Pydantic settings singleton, auto device detection |
| `detect.py` | OWLv2 inference — `infer()` (PIL, shared) + `detect_image()` (file) + `run_detection()` (batch CLI) |
| `segment.py` | SAM2 integration — `load_sam2()`, `segment_with_boxes()`, `_mask_to_polygon()` |
| `export.py` | COCO JSON builder (no pycocotools); supports both bbox-only and segmentation |
| `finetune.py` | Training loop, loss, dataset, scheduler |
| `utils.py` | `collect_images`, `save_json`, `load_json`, `setup_logging` |

**Key design:** `detect.infer()` is the single OWLv2 inference implementation.
`app.py` chains SAM2 on top when mode == "Segmentation" — no duplication.

---

## Device strategy

| Platform | Device | dtype |
|----------|--------|-------|
| Apple Silicon | `mps` | `float32` |
| Windows/Linux GPU | `cuda` | `float16` |
| CPU fallback | `cpu` | `float32` |

`PYTORCH_ENABLE_MPS_FALLBACK=1` must be set before torch is imported on MPS
(`.env` handles this). Without it, some OWLv2 ops raise `NotImplementedError`.

---

## OWLv2 model

Default: `google/owlv2-large-patch14-finetuned` (~700 MB, cached in
`~/.cache/huggingface` after first download).

Override via env var: `AUTOLABEL_MODEL=google/owlv2-base-patch16`

| Variant | Size | Notes |
|---------|------|-------|
| `owlv2-base-patch16` | ~300 MB | Faster, lower accuracy |
| `owlv2-large-patch14` | ~700 MB | Good balance |
| `owlv2-large-patch14-finetuned` | ~700 MB | Default — pre-trained on LVIS/Objects365 |

---

## Dependency decisions

| Package | Why kept |
|---------|---------|
| `torch` / `torchvision` | OWLv2 + SAM2 inference |
| `transformers>=4.45` | OWLv2 and SAM2 models & processors |
| `pillow` | Image I/O and annotation drawing |
| `numpy` | Gradio image array interchange; mask arrays |
| `opencv-python` | `cv2.findContours` for mask → COCO polygon (SAM2) |
| `pydantic` / `pydantic-settings` | Type-safe config with env-var loading |
| `click` | CLI option parsing |
| `tqdm` | Progress bars in CLI batch runner |
| `python-dotenv` | Load `.env` before torch (MPS fallback) |
| `gradio` | Web UI |

Removed: `supervision` (unused), `matplotlib` (fine-tune charts gone),
`requests` (Label Studio gone).

---

## Inference flow

```
PIL image
    ↓
detect.infer(image, processor, model, prompts, threshold, device, dtype)
    ↓
list[{label, score, box_xyxy}]
    │
    ├─ Detection mode ──────────────────────────────────────────────────
    │   ↓ used by app.py directly
    │   ↓ (CLI: wrapped by detect_image → JSON)
    │   ↓ export.build_coco → coco_export.json  (bbox only, segmentation:[])
    │
    └─ Segmentation mode ───────────────────────────────────────────────
        ↓
        segment.segment_with_boxes(image, detections, sam2_processor, sam2_model)
        ↓
        list[{label, score, box_xyxy, mask (np.ndarray), segmentation (polygons)}]
        ↓ mask used for visualization overlay; dropped before JSON serialisation
        ↓ export.build_coco → coco_export.json  (bbox + segmentation polygons)
```

---

## Batch export ZIP structure

```
autolabel_export.zip
├── coco_export.json          # COCO format, dimensions match images below
└── images/
    ├── photo1.jpg            # resized to chosen training size (e.g. 640×640)
    └── photo2.jpg
```

COCO bounding boxes are in the coordinate space of the resized images.

---

## Known limitations

- OWLv2 is detection-only — bounding boxes, no masks.
- Objects < 32×32 px are often missed at default resolution.
- MPS inference is slower than CUDA but fast enough for development.
- Threshold default is 0.1 (intentionally low — easier to discard false
  positives than recover missed objects).

---

## Fine-tuning (future)

The fine-tuning infrastructure is complete (`autolabel/finetune.py`,
`scripts/finetune_owlv2.py`) but not in active use. Workflow when ready:

1. Use the Batch tab to generate a labeled `coco_export.json`
2. Run `make finetune` (or `uv run python scripts/finetune_owlv2.py --help`)
3. Evaluate the fine-tuned model in the Test tab