# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

Reusable video analysis base combining object detection, segmentation, depth estimation, and multi-object tracking. Deployed as a Hugging Face Space (Docker SDK). Designed for multi-GPU inference with async job processing and live MJPEG streaming.

## Development Commands

```bash
# Setup
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

# Run dev server
uvicorn app:app --host 0.0.0.0 --port 7860 --reload

# Docker (production / HF Spaces)
docker build -t detection_base . && docker run -p 7860:7860 detection_base

# Test async detection
curl -X POST http://localhost:7860/detect/async \
  -F "video=@sample.mp4" \
  -F "mode=object_detection" \
  -F "queries=person,car" \
  -F "detector=yolo11"
```

No test suite exists. Verify changes by running the server and testing through the UI at `http://localhost:7860`.

## Architecture

### Request Flow

```
index.html → POST /detect/async → app.py
  ├─ process_first_frame()           # Fast preview (~1-2s)
  ├─ Return job_id + URLs immediately
  └─ Background: process_video_async()
      ├─ run_inference()                 # Detection mode
      └─ run_grounded_sam2_tracking()    # Segmentation mode
```

The async pipeline returns instantly with a `job_id`. The frontend polls `/detect/status/{job_id}` and streams live frames via `/detect/stream/{job_id}` (MJPEG).

### API Endpoints (app.py)

**Core:** `POST /detect` (sync), `POST /detect/async` (async with streaming)
**Job management:** `GET /detect/status/{job_id}`, `DELETE /detect/job/{job_id}`, `GET /detect/video/{job_id}`, `GET /detect/stream/{job_id}`
**Per-frame data:** `GET /detect/tracks/{job_id}/{frame_idx}`, `GET /detect/first-frame/{job_id}`, `GET /detect/first-frame-depth/{job_id}`, `GET /detect/depth-video/{job_id}`
**Benchmarking:** `POST /benchmark`, `POST /benchmark/profile`, `POST /benchmark/analysis`, `GET /gpu-monitor`, `GET /benchmark/hardware`

### Model Registries

All models use a registry + factory pattern with `@lru_cache` for singleton loading. Use `load_*_on_device(name, device)` for multi-GPU (no cache).

**Detectors** (`models/model_loader.py`):
| Key | Model | Vocabulary |
|-----|-------|-----------|
| `yolo11` (default) | YOLO11m | COCO classes only |
| `detr_resnet50` | DETR | COCO classes only |
| `grounding_dino` | Grounding DINO | Open-vocabulary (arbitrary text) |
| `drone_yolo` | Drone YOLO | Specialized UAV detection |

All implement `ObjectDetector.predict(frame, queries)` → `DetectionResult(boxes, scores, labels, label_names)` from `models/detectors/base.py`.

**Segmenters** (`models/segmenters/model_loader.py`):
- `GSAM2-S/B/L` — Grounded SAM2 (small/base/large) backed by grounding_dino
- `YSAM2-S/B/L` — YOLO-SAM2 (small/base/large) backed by yolo11

**Depth** (`models/depth_estimators/model_loader.py`):
- `depth` — DepthAnythingV2

### Inference Pipeline (inference.py)

Three public entry points:
- **`process_first_frame()`** — Extract + detect on frame 0 only. Returns processed frame + detections.
- **`run_inference()`** — Full detection pipeline. Multi-GPU data parallelism with worker threads per GPU, reorder buffer for out-of-order completion, ByteTracker for object tracking, optional depth.
- **`run_grounded_sam2_tracking()`** — SAM2 segmentation with temporal coherence. Uses `SharedFrameStore` (in-memory decoded frames, 12 GiB budget) or falls back to JPEG extraction. `step` parameter controls keyframe interval.

### Async Job System (jobs/)

- **`jobs/models.py`** — `JobInfo` dataclass, `JobStatus` enum (PROCESSING/COMPLETED/FAILED/CANCELLED)
- **`jobs/storage.py`** — Thread-safe in-memory storage at `/tmp/detection_jobs/{job_id}/`. Auto-cleanup every 10 minutes.
- **`jobs/background.py`** — `process_video_async()` dispatches to the correct inference function, updates job status.
- **`jobs/streaming.py`** — Event-driven MJPEG frame publishing. Non-blocking (drops if consumer is slow). Frames pre-resized to 640px width.

### Concurrency Model

- Per-model `RLock` for GPU serialization (`inference.py:_get_model_lock`)
- Multi-GPU workers use separate model instances per device
- `AsyncVideoReader` prefetches frames in a background thread to prevent GPU starvation

### Frontend (index.html)

Single HTML page with vanilla JS. Upload video, pick mode/model, view first frame, live MJPEG stream, download processed video, inspect detection JSON.

## Adding a New Detector

1. Create class in `models/detectors/` implementing `ObjectDetector` from `base.py`
2. Register in `models/model_loader.py` `_REGISTRY`
3. Add option to detector dropdown in `index.html`

## Dual Remotes

- `hf` → Hugging Face Space (deployment)
- `github` → GitHub (version control)