# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview Reusable video analysis base combining object detection, segmentation, depth estimation, and multi-object tracking. Deployed as a Hugging Face Space (Docker SDK). Designed for multi-GPU inference with async job processing and live MJPEG streaming. ## Development Commands ```bash # Setup python -m venv .venv && source .venv/bin/activate pip install -r requirements.txt # Run dev server uvicorn app:app --host 0.0.0.0 --port 7860 --reload # Docker (production / HF Spaces) docker build -t detection_base . && docker run -p 7860:7860 detection_base # Test async detection curl -X POST http://localhost:7860/detect/async \ -F "video=@sample.mp4" \ -F "mode=object_detection" \ -F "queries=person,car" \ -F "detector=yolo11" ``` No test suite exists. Verify changes by running the server and testing through the UI at `http://localhost:7860`. ## Architecture ### Request Flow ``` index.html → POST /detect/async → app.py ├─ process_first_frame() # Fast preview (~1-2s) ├─ Return job_id + URLs immediately └─ Background: process_video_async() ├─ run_inference() # Detection mode └─ run_grounded_sam2_tracking() # Segmentation mode ``` The async pipeline returns instantly with a `job_id`. The frontend polls `/detect/status/{job_id}` and streams live frames via `/detect/stream/{job_id}` (MJPEG). ### API Endpoints (app.py) **Core:** `POST /detect` (sync), `POST /detect/async` (async with streaming) **Job management:** `GET /detect/status/{job_id}`, `DELETE /detect/job/{job_id}`, `GET /detect/video/{job_id}`, `GET /detect/stream/{job_id}` **Per-frame data:** `GET /detect/tracks/{job_id}/{frame_idx}`, `GET /detect/first-frame/{job_id}`, `GET /detect/first-frame-depth/{job_id}`, `GET /detect/depth-video/{job_id}` **Benchmarking:** `POST /benchmark`, `POST /benchmark/profile`, `POST /benchmark/analysis`, `GET /gpu-monitor`, `GET /benchmark/hardware` ### Model Registries All models use a registry + factory pattern with `@lru_cache` for singleton loading. Use `load_*_on_device(name, device)` for multi-GPU (no cache). **Detectors** (`models/model_loader.py`): | Key | Model | Vocabulary | |-----|-------|-----------| | `yolo11` (default) | YOLO11m | COCO classes only | | `detr_resnet50` | DETR | COCO classes only | | `grounding_dino` | Grounding DINO | Open-vocabulary (arbitrary text) | | `drone_yolo` | Drone YOLO | Specialized UAV detection | All implement `ObjectDetector.predict(frame, queries)` → `DetectionResult(boxes, scores, labels, label_names)` from `models/detectors/base.py`. **Segmenters** (`models/segmenters/model_loader.py`): - `GSAM2-S/B/L` — Grounded SAM2 (small/base/large) backed by grounding_dino - `YSAM2-S/B/L` — YOLO-SAM2 (small/base/large) backed by yolo11 **Depth** (`models/depth_estimators/model_loader.py`): - `depth` — DepthAnythingV2 ### Inference Pipeline (inference.py) Three public entry points: - **`process_first_frame()`** — Extract + detect on frame 0 only. Returns processed frame + detections. - **`run_inference()`** — Full detection pipeline. Multi-GPU data parallelism with worker threads per GPU, reorder buffer for out-of-order completion, ByteTracker for object tracking, optional depth. - **`run_grounded_sam2_tracking()`** — SAM2 segmentation with temporal coherence. Uses `SharedFrameStore` (in-memory decoded frames, 12 GiB budget) or falls back to JPEG extraction. `step` parameter controls keyframe interval. ### Async Job System (jobs/) - **`jobs/models.py`** — `JobInfo` dataclass, `JobStatus` enum (PROCESSING/COMPLETED/FAILED/CANCELLED) - **`jobs/storage.py`** — Thread-safe in-memory storage at `/tmp/detection_jobs/{job_id}/`. Auto-cleanup every 10 minutes. - **`jobs/background.py`** — `process_video_async()` dispatches to the correct inference function, updates job status. - **`jobs/streaming.py`** — Event-driven MJPEG frame publishing. Non-blocking (drops if consumer is slow). Frames pre-resized to 640px width. ### Concurrency Model - Per-model `RLock` for GPU serialization (`inference.py:_get_model_lock`) - Multi-GPU workers use separate model instances per device - `AsyncVideoReader` prefetches frames in a background thread to prevent GPU starvation ### Frontend (index.html) Single HTML page with vanilla JS. Upload video, pick mode/model, view first frame, live MJPEG stream, download processed video, inspect detection JSON. ## Adding a New Detector 1. Create class in `models/detectors/` implementing `ObjectDetector` from `base.py` 2. Register in `models/model_loader.py` `_REGISTRY` 3. Add option to detector dropdown in `index.html` ## Dual Remotes - `hf` → Hugging Face Space (deployment) - `github` → GitHub (version control)