Spaces:

bytestream89
/

LabelPlayground

Sleeping

App Files Files Community

LabelPlayground / CONTEXT.md

Erick

Upload folder using huggingface_hub

47cb9bd verified about 2 months ago

preview code

raw

history blame contribute delete

5.84 kB

A newer version of the Gradio SDK is available: 6.12.0

Upgrade

CONTEXT.md — Technical Reference for autolabel

Keep this file up to date as the project evolves. Read this first when resuming work after a break.

What this project does

Uses OWLv2 (open-vocabulary object detection) and SAM2 (segment anything) to auto-label images via text prompts, then exports a COCO dataset for fine-tuning a detection or segmentation model.

Current phase: labeling — two modes available:

Detection — OWLv2 only; produces bounding boxes.
Segmentation — OWLv2 → boxes → SAM2 → pixel masks + COCO polygons.

Future phase: fine-tune OWLv2 on the exported COCO dataset using scripts/finetune_owlv2.py (code is ready, not yet in active use).

Architecture

Primary interface — `app.py` (Gradio web UI)

Two-tab UI, all artifacts written to a session temp dir (nothing in the project):

Tab	What it does
🧪 Test	Single image → instant annotated preview. Dial in prompts and threshold before a batch run.
📂 Batch	Multiple images → annotated gallery + downloadable ZIP (resized images + `coco_export.json`).

CLI scripts (`scripts/`)

Independent entry points for headless / automation use:

Script	Purpose
`run_detection.py`	Batch detect → `data/detections/`
`export_coco.py`	Build COCO JSON from `data/labeled/`
`finetune_owlv2.py`	Fine-tune OWLv2 (future)

`autolabel/` package

Module	Responsibility
`config.py`	Pydantic settings singleton, auto device detection
`detect.py`	OWLv2 inference — `infer()` (PIL, shared) + `detect_image()` (file) + `run_detection()` (batch CLI)
`segment.py`	SAM2 integration — `load_sam2()`, `segment_with_boxes()`, `_mask_to_polygon()`
`export.py`	COCO JSON builder (no pycocotools); supports both bbox-only and segmentation
`finetune.py`	Training loop, loss, dataset, scheduler
`utils.py`	`collect_images`, `save_json`, `load_json`, `setup_logging`

Key design: detect.infer() is the single OWLv2 inference implementation. app.py chains SAM2 on top when mode == "Segmentation" — no duplication.

Device strategy

Platform	Device	dtype
Apple Silicon	`mps`	`float32`
Windows/Linux GPU	`cuda`	`float16`
CPU fallback	`cpu`	`float32`

PYTORCH_ENABLE_MPS_FALLBACK=1 must be set before torch is imported on MPS (.env handles this). Without it, some OWLv2 ops raise NotImplementedError.

OWLv2 model

Default: google/owlv2-large-patch14-finetuned (~~700 MB, cached in `~~/.cache/huggingface` after first download).

Override via env var: AUTOLABEL_MODEL=google/owlv2-base-patch16

Variant	Size	Notes
`owlv2-base-patch16`	~300 MB	Faster, lower accuracy
`owlv2-large-patch14`	~700 MB	Good balance
`owlv2-large-patch14-finetuned`	~700 MB	Default — pre-trained on LVIS/Objects365

Dependency decisions

Package	Why kept
`torch` / `torchvision`	OWLv2 + SAM2 inference
`transformers>=4.45`	OWLv2 and SAM2 models & processors
`pillow`	Image I/O and annotation drawing
`numpy`	Gradio image array interchange; mask arrays
`opencv-python`	`cv2.findContours` for mask → COCO polygon (SAM2)
`pydantic` / `pydantic-settings`	Type-safe config with env-var loading
`click`	CLI option parsing
`tqdm`	Progress bars in CLI batch runner
`python-dotenv`	Load `.env` before torch (MPS fallback)
`gradio`	Web UI

Removed: supervision (unused), matplotlib (fine-tune charts gone), requests (Label Studio gone).

Inference flow

PIL image
    ↓
detect.infer(image, processor, model, prompts, threshold, device, dtype)
    ↓
list[{label, score, box_xyxy}]
    │
    ├─ Detection mode ──────────────────────────────────────────────────
    │   ↓ used by app.py directly
    │   ↓ (CLI: wrapped by detect_image → JSON)
    │   ↓ export.build_coco → coco_export.json  (bbox only, segmentation:[])
    │
    └─ Segmentation mode ───────────────────────────────────────────────
        ↓
        segment.segment_with_boxes(image, detections, sam2_processor, sam2_model)
        ↓
        list[{label, score, box_xyxy, mask (np.ndarray), segmentation (polygons)}]
        ↓ mask used for visualization overlay; dropped before JSON serialisation
        ↓ export.build_coco → coco_export.json  (bbox + segmentation polygons)

Batch export ZIP structure

autolabel_export.zip
├── coco_export.json          # COCO format, dimensions match images below
└── images/
    ├── photo1.jpg            # resized to chosen training size (e.g. 640×640)
    └── photo2.jpg

COCO bounding boxes are in the coordinate space of the resized images.

Known limitations

OWLv2 is detection-only — bounding boxes, no masks.
Objects < 32×32 px are often missed at default resolution.
MPS inference is slower than CUDA but fast enough for development.
Threshold default is 0.1 (intentionally low — easier to discard false positives than recover missed objects).

Fine-tuning (future)

The fine-tuning infrastructure is complete (autolabel/finetune.py, scripts/finetune_owlv2.py) but not in active use. Workflow when ready:

Use the Batch tab to generate a labeled coco_export.json
Run make finetune (or uv run python scripts/finetune_owlv2.py --help)
Evaluate the fine-tuned model in the Test tab