LabelPlayground / CONTEXT.md
Erick
Upload folder using huggingface_hub
47cb9bd verified

A newer version of the Gradio SDK is available: 6.12.0

Upgrade

CONTEXT.md β€” Technical Reference for autolabel

Keep this file up to date as the project evolves. Read this first when resuming work after a break.


What this project does

Uses OWLv2 (open-vocabulary object detection) and SAM2 (segment anything) to auto-label images via text prompts, then exports a COCO dataset for fine-tuning a detection or segmentation model.

Current phase: labeling β€” two modes available:

  • Detection β€” OWLv2 only; produces bounding boxes.
  • Segmentation β€” OWLv2 β†’ boxes β†’ SAM2 β†’ pixel masks + COCO polygons.

Future phase: fine-tune OWLv2 on the exported COCO dataset using scripts/finetune_owlv2.py (code is ready, not yet in active use).


Architecture

Primary interface β€” app.py (Gradio web UI)

Two-tab UI, all artifacts written to a session temp dir (nothing in the project):

Tab What it does
πŸ§ͺ Test Single image β†’ instant annotated preview. Dial in prompts and threshold before a batch run.
πŸ“‚ Batch Multiple images β†’ annotated gallery + downloadable ZIP (resized images + coco_export.json).

CLI scripts (scripts/)

Independent entry points for headless / automation use:

Script Purpose
run_detection.py Batch detect β†’ data/detections/
export_coco.py Build COCO JSON from data/labeled/
finetune_owlv2.py Fine-tune OWLv2 (future)

autolabel/ package

Module Responsibility
config.py Pydantic settings singleton, auto device detection
detect.py OWLv2 inference β€” infer() (PIL, shared) + detect_image() (file) + run_detection() (batch CLI)
segment.py SAM2 integration β€” load_sam2(), segment_with_boxes(), _mask_to_polygon()
export.py COCO JSON builder (no pycocotools); supports both bbox-only and segmentation
finetune.py Training loop, loss, dataset, scheduler
utils.py collect_images, save_json, load_json, setup_logging

Key design: detect.infer() is the single OWLv2 inference implementation. app.py chains SAM2 on top when mode == "Segmentation" β€” no duplication.


Device strategy

Platform Device dtype
Apple Silicon mps float32
Windows/Linux GPU cuda float16
CPU fallback cpu float32

PYTORCH_ENABLE_MPS_FALLBACK=1 must be set before torch is imported on MPS (.env handles this). Without it, some OWLv2 ops raise NotImplementedError.


OWLv2 model

Default: google/owlv2-large-patch14-finetuned (700 MB, cached in `/.cache/huggingface` after first download).

Override via env var: AUTOLABEL_MODEL=google/owlv2-base-patch16

Variant Size Notes
owlv2-base-patch16 ~300 MB Faster, lower accuracy
owlv2-large-patch14 ~700 MB Good balance
owlv2-large-patch14-finetuned ~700 MB Default β€” pre-trained on LVIS/Objects365

Dependency decisions

Package Why kept
torch / torchvision OWLv2 + SAM2 inference
transformers>=4.45 OWLv2 and SAM2 models & processors
pillow Image I/O and annotation drawing
numpy Gradio image array interchange; mask arrays
opencv-python cv2.findContours for mask β†’ COCO polygon (SAM2)
pydantic / pydantic-settings Type-safe config with env-var loading
click CLI option parsing
tqdm Progress bars in CLI batch runner
python-dotenv Load .env before torch (MPS fallback)
gradio Web UI

Removed: supervision (unused), matplotlib (fine-tune charts gone), requests (Label Studio gone).


Inference flow

PIL image
    ↓
detect.infer(image, processor, model, prompts, threshold, device, dtype)
    ↓
list[{label, score, box_xyxy}]
    β”‚
    β”œβ”€ Detection mode ──────────────────────────────────────────────────
    β”‚   ↓ used by app.py directly
    β”‚   ↓ (CLI: wrapped by detect_image β†’ JSON)
    β”‚   ↓ export.build_coco β†’ coco_export.json  (bbox only, segmentation:[])
    β”‚
    └─ Segmentation mode ───────────────────────────────────────────────
        ↓
        segment.segment_with_boxes(image, detections, sam2_processor, sam2_model)
        ↓
        list[{label, score, box_xyxy, mask (np.ndarray), segmentation (polygons)}]
        ↓ mask used for visualization overlay; dropped before JSON serialisation
        ↓ export.build_coco β†’ coco_export.json  (bbox + segmentation polygons)

Batch export ZIP structure

autolabel_export.zip
β”œβ”€β”€ coco_export.json          # COCO format, dimensions match images below
└── images/
    β”œβ”€β”€ photo1.jpg            # resized to chosen training size (e.g. 640Γ—640)
    └── photo2.jpg

COCO bounding boxes are in the coordinate space of the resized images.


Known limitations

  • OWLv2 is detection-only β€” bounding boxes, no masks.
  • Objects < 32Γ—32 px are often missed at default resolution.
  • MPS inference is slower than CUDA but fast enough for development.
  • Threshold default is 0.1 (intentionally low β€” easier to discard false positives than recover missed objects).

Fine-tuning (future)

The fine-tuning infrastructure is complete (autolabel/finetune.py, scripts/finetune_owlv2.py) but not in active use. Workflow when ready:

  1. Use the Batch tab to generate a labeled coco_export.json
  2. Run make finetune (or uv run python scripts/finetune_owlv2.py --help)
  3. Evaluate the fine-tuned model in the Test tab