Spaces:
Sleeping
A newer version of the Gradio SDK is available: 6.12.0
CONTEXT.md β Technical Reference for autolabel
Keep this file up to date as the project evolves. Read this first when resuming work after a break.
What this project does
Uses OWLv2 (open-vocabulary object detection) and SAM2 (segment anything) to auto-label images via text prompts, then exports a COCO dataset for fine-tuning a detection or segmentation model.
Current phase: labeling β two modes available:
- Detection β OWLv2 only; produces bounding boxes.
- Segmentation β OWLv2 β boxes β SAM2 β pixel masks + COCO polygons.
Future phase: fine-tune OWLv2 on the exported COCO dataset using
scripts/finetune_owlv2.py (code is ready, not yet in active use).
Architecture
Primary interface β app.py (Gradio web UI)
Two-tab UI, all artifacts written to a session temp dir (nothing in the project):
| Tab | What it does |
|---|---|
| π§ͺ Test | Single image β instant annotated preview. Dial in prompts and threshold before a batch run. |
| π Batch | Multiple images β annotated gallery + downloadable ZIP (resized images + coco_export.json). |
CLI scripts (scripts/)
Independent entry points for headless / automation use:
| Script | Purpose |
|---|---|
run_detection.py |
Batch detect β data/detections/ |
export_coco.py |
Build COCO JSON from data/labeled/ |
finetune_owlv2.py |
Fine-tune OWLv2 (future) |
autolabel/ package
| Module | Responsibility |
|---|---|
config.py |
Pydantic settings singleton, auto device detection |
detect.py |
OWLv2 inference β infer() (PIL, shared) + detect_image() (file) + run_detection() (batch CLI) |
segment.py |
SAM2 integration β load_sam2(), segment_with_boxes(), _mask_to_polygon() |
export.py |
COCO JSON builder (no pycocotools); supports both bbox-only and segmentation |
finetune.py |
Training loop, loss, dataset, scheduler |
utils.py |
collect_images, save_json, load_json, setup_logging |
Key design: detect.infer() is the single OWLv2 inference implementation.
app.py chains SAM2 on top when mode == "Segmentation" β no duplication.
Device strategy
| Platform | Device | dtype |
|---|---|---|
| Apple Silicon | mps |
float32 |
| Windows/Linux GPU | cuda |
float16 |
| CPU fallback | cpu |
float32 |
PYTORCH_ENABLE_MPS_FALLBACK=1 must be set before torch is imported on MPS
(.env handles this). Without it, some OWLv2 ops raise NotImplementedError.
OWLv2 model
Default: google/owlv2-large-patch14-finetuned (700 MB, cached in
`/.cache/huggingface` after first download).
Override via env var: AUTOLABEL_MODEL=google/owlv2-base-patch16
| Variant | Size | Notes |
|---|---|---|
owlv2-base-patch16 |
~300 MB | Faster, lower accuracy |
owlv2-large-patch14 |
~700 MB | Good balance |
owlv2-large-patch14-finetuned |
~700 MB | Default β pre-trained on LVIS/Objects365 |
Dependency decisions
| Package | Why kept |
|---|---|
torch / torchvision |
OWLv2 + SAM2 inference |
transformers>=4.45 |
OWLv2 and SAM2 models & processors |
pillow |
Image I/O and annotation drawing |
numpy |
Gradio image array interchange; mask arrays |
opencv-python |
cv2.findContours for mask β COCO polygon (SAM2) |
pydantic / pydantic-settings |
Type-safe config with env-var loading |
click |
CLI option parsing |
tqdm |
Progress bars in CLI batch runner |
python-dotenv |
Load .env before torch (MPS fallback) |
gradio |
Web UI |
Removed: supervision (unused), matplotlib (fine-tune charts gone),
requests (Label Studio gone).
Inference flow
PIL image
β
detect.infer(image, processor, model, prompts, threshold, device, dtype)
β
list[{label, score, box_xyxy}]
β
ββ Detection mode ββββββββββββββββββββββββββββββββββββββββββββββββββ
β β used by app.py directly
β β (CLI: wrapped by detect_image β JSON)
β β export.build_coco β coco_export.json (bbox only, segmentation:[])
β
ββ Segmentation mode βββββββββββββββββββββββββββββββββββββββββββββββ
β
segment.segment_with_boxes(image, detections, sam2_processor, sam2_model)
β
list[{label, score, box_xyxy, mask (np.ndarray), segmentation (polygons)}]
β mask used for visualization overlay; dropped before JSON serialisation
β export.build_coco β coco_export.json (bbox + segmentation polygons)
Batch export ZIP structure
autolabel_export.zip
βββ coco_export.json # COCO format, dimensions match images below
βββ images/
βββ photo1.jpg # resized to chosen training size (e.g. 640Γ640)
βββ photo2.jpg
COCO bounding boxes are in the coordinate space of the resized images.
Known limitations
- OWLv2 is detection-only β bounding boxes, no masks.
- Objects < 32Γ32 px are often missed at default resolution.
- MPS inference is slower than CUDA but fast enough for development.
- Threshold default is 0.1 (intentionally low β easier to discard false positives than recover missed objects).
Fine-tuning (future)
The fine-tuning infrastructure is complete (autolabel/finetune.py,
scripts/finetune_owlv2.py) but not in active use. Workflow when ready:
- Use the Batch tab to generate a labeled
coco_export.json - Run
make finetune(oruv run python scripts/finetune_owlv2.py --help) - Evaluate the fine-tuned model in the Test tab