Zhen Ye Claude Opus 4.6 (1M context) commited on
Commit
2e2a601
·
1 Parent(s): 165863b

chore: clean up dead code, stale comments, and misleading names

Browse files

- Rename gpt_data -> metadata on STrack (tracker.py)
- Remove dead inject_metadata method and phantom METADATA_SYNC_KEYS
- Fix _sync_data to sync depth_rel (the key actually written)
- Remove dead _build_display_label and gpt_distance_m checks
- Remove no-op add_no_cache_header middleware
- Remove unused get_segmenter_detector import
- Cache index.html at module load instead of reading per request
- Extract _parse_queries helper to deduplicate query parsing
- Remove empty set_track_data calls in segmentation writer loop
- Remove sentence-transformers semantic matching from coco_classes
- Clean stale GPT/mission references from docstrings
- Update CLAUDE.md to reflect stripped-down architecture

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

CLAUDE.md CHANGED
@@ -4,251 +4,105 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
4
 
5
  ## Project Overview
6
 
7
- Simple video object detection system with three modes:
8
- - **Object Detection**: Detect custom objects using text queries (fully functional)
9
- - **Segmentation**: Mask overlays using SAM3
10
- - **Drone Detection**: (Coming Soon) Specialized UAV detection
11
-
12
- ## Core Architecture
13
-
14
- ### Simple Detection Flow
15
-
16
- ```
17
- User → demo.html → POST /detect → inference.py → detector → processed video
18
- ```
19
-
20
- 1. User selects mode and uploads video via web interface
21
- 2. Frontend sends video + mode + queries to `/detect` endpoint
22
- 3. Backend runs detection inference with selected model
23
- 4. Returns processed video with bounding boxes
24
-
25
- ### Available Detectors
26
-
27
- The system includes 4 pre-trained object detection models:
28
-
29
- | Detector | Key | Type | Best For |
30
- |----------|-----|------|----------|
31
- | **OWLv2** | `owlv2_base` | Open-vocabulary | Custom text queries (default) |
32
- | **YOLOv8** | `hf_yolov8` | COCO classes | Fast real-time detection |
33
- | **DETR** | `detr_resnet50` | COCO classes | Transformer-based detection |
34
- | **Grounding DINO** | `grounding_dino` | Open-vocabulary | Text-grounded detection |
35
-
36
- All detectors implement the `ObjectDetector` interface in `models/detectors/base.py` with a single `predict()` method.
37
 
38
  ## Development Commands
39
 
40
- ### Setup
41
  ```bash
42
- python -m venv .venv
43
- source .venv/bin/activate # or `.venv/bin/activate` on macOS/Linux
44
  pip install -r requirements.txt
45
- ```
46
 
47
- ### Running the Server
48
- ```bash
49
- # Development
50
  uvicorn app:app --host 0.0.0.0 --port 7860 --reload
51
 
52
- # Production (Docker)
53
- docker build -t object_detectors .
54
- docker run -p 7860:7860 object_detectors
55
- ```
56
 
57
- ### Testing the API
58
- ```bash
59
- # Test object detection
60
- curl -X POST http://localhost:7860/detect \
61
  -F "video=@sample.mp4" \
62
  -F "mode=object_detection" \
63
- -F "queries=person,car,dog" \
64
- -F "detector=owlv2_base" \
65
- --output processed.mp4
66
-
67
- # Test placeholder modes (returns JSON)
68
- curl -X POST http://localhost:7860/detect \
69
- -F "video=@sample.mp4" \
70
- -F "mode=segmentation"
71
- ```
72
-
73
- ## Key Implementation Details
74
-
75
- ### API Endpoint: `/detect`
76
-
77
- **Parameters:**
78
- - `video` (file): Video file to process
79
- - `mode` (string): Detection mode - `object_detection`, `segmentation`, or `drone_detection`
80
- - `queries` (string): Comma-separated object classes (for object_detection mode)
81
- - `detector` (string): Model key (default: `owlv2_base`)
82
-
83
- **Returns:**
84
- - For `object_detection`: MP4 video with bounding boxes
85
- - For `segmentation`: MP4 video with mask overlays
86
- - For `drone_detection`: JSON with `{"status": "coming_soon", "message": "..."}`
87
-
88
- ### Inference Pipeline
89
-
90
- The `run_inference()` function in `inference.py` follows these steps:
91
-
92
- 1. **Extract Frames**: Decode video using OpenCV
93
- 2. **Parse Queries**: Split comma-separated text into list (defaults to common objects if empty)
94
- 3. **Select Detector**: Load detector by key (cached via `@lru_cache`)
95
- 4. **Process Frames**: Run detection on each frame
96
- - Call `detector.predict(frame, queries)`
97
- - Draw green bounding boxes on detections
98
- 5. **Write Video**: Encode processed frames back to MP4
99
-
100
- Default queries (if none provided): `["person", "car", "truck", "motorcycle", "bicycle", "bus", "train", "airplane"]`
101
-
102
- ### Detector Loading
103
-
104
- Detectors are registered in `models/model_loader.py`:
105
-
106
- ```python
107
- _REGISTRY: Dict[str, Callable[[], ObjectDetector]] = {
108
- "owlv2_base": Owlv2Detector,
109
- "hf_yolov8": HuggingFaceYoloV8Detector,
110
- "detr_resnet50": DetrDetector,
111
- "grounding_dino": GroundingDinoDetector,
112
- }
113
  ```
114
 
115
- Loaded via `load_detector(name)` which caches instances for performance.
116
 
117
- ### Detection Result Format
118
 
119
- All detectors return a `DetectionResult` namedtuple:
120
- ```python
121
- DetectionResult(
122
- boxes: np.ndarray, # Nx4 array [x1, y1, x2, y2]
123
- scores: Sequence[float], # Confidence scores
124
- labels: Sequence[int], # Class indices
125
- label_names: Optional[Sequence[str]] # Human-readable names
126
- )
127
- ```
128
-
129
- ## File Structure
130
 
131
  ```
132
- .
133
- ├─ app.py # FastAPI server with /detect endpoint
134
- ├─ inference.py # Video processing and detection pipeline
135
- demo.html # Web interface with mode selector
136
- ├─ requirements.txt # Python dependencies
137
- models/
138
- │ ├── model_loader.py # Detector registry and loading
139
- │ └── detectors/
140
- │ ├── base.py # ObjectDetector interface
141
- │ ├── owlv2.py # OWLv2 implementation
142
- │ ├── yolov8.py # YOLOv8 implementation
143
- │ ├── detr.py # DETR implementation
144
- │ └── grounding_dino.py # Grounding DINO implementation
145
- ├── utils/
146
- │ └── video.py # Video encoding/decoding utilities
147
- └── coco_classes.py # COCO dataset class definitions
148
  ```
149
 
150
- ## Adding New Detectors
151
-
152
- To add a new detector:
153
-
154
- 1. **Create detector class** in `models/detectors/`:
155
- ```python
156
- from .base import ObjectDetector, DetectionResult
157
 
158
- class MyDetector(ObjectDetector):
159
- name = "my_detector"
160
 
161
- def predict(self, frame, queries):
162
- # Your detection logic
163
- return DetectionResult(boxes, scores, labels, label_names)
164
- ```
165
 
166
- 2. **Register in model_loader.py**:
167
- ```python
168
- _REGISTRY = {
169
- ...
170
- "my_detector": MyDetector,
171
- }
172
- ```
173
 
174
- 3. **Update frontend** `demo.html` detector dropdown:
175
- ```html
176
- <option value="my_detector">My Detector</option>
177
- ```
178
 
179
- ## Adding New Detection Modes
 
 
 
 
 
 
180
 
181
- To implement additional modes such as drone detection:
182
 
183
- 1. **Create specialized detector** (if needed):
184
- - For segmentation: Extend `SegmentationResult` to include masks
185
- - For drone detection: Create `DroneDetector` with specialized filtering
186
 
187
- 2. **Update `/detect` endpoint** in `app.py`:
188
- ```python
189
- if mode == "segmentation":
190
- # Run segmentation inference
191
- # Return video with masks rendered
192
- ```
193
 
194
- 3. **Update frontend** to remove "disabled" class from mode card
195
 
196
- 4. **Update inference.py** if needed to handle new output types
 
 
 
197
 
198
- ## Common Patterns
199
-
200
- ### Query Processing
201
- Queries are parsed from comma-separated strings:
202
- ```python
203
- queries = [q.strip() for q in "person, car, dog".split(",") if q.strip()]
204
- # Result: ["person", "car", "dog"]
205
- ```
206
-
207
- ### Frame Processing Loop
208
- Standard pattern for processing video frames:
209
- ```python
210
- processed_frames = []
211
- for idx, frame in enumerate(frames):
212
- processed_frame, detections = infer_frame(frame, queries, detector_name)
213
- processed_frames.append(processed_frame)
214
- ```
215
-
216
- ### Temporary File Management
217
- FastAPI's `BackgroundTasks` cleans up temp files after response:
218
- ```python
219
- _schedule_cleanup(background_tasks, input_path)
220
- _schedule_cleanup(background_tasks, output_path)
221
- ```
222
 
223
- ## Performance Notes
 
 
 
224
 
225
- - **Detector Caching**: Models are loaded once and cached via `@lru_cache`
226
- - **Default Resolution**: Videos processed at original resolution
227
- - **Frame Limit**: Use `max_frames` parameter in `run_inference()` for testing
228
- - **Memory Usage**: Entire video is loaded into memory (frames list)
229
 
230
- ## Troubleshooting
 
 
231
 
232
- ### "No module named 'fastapi'"
233
- Install dependencies: `pip install -r requirements.txt`
234
 
235
- ### "Video decoding failed"
236
- Check video codec compatibility. System expects MP4/H.264.
237
 
238
- ### "Detector not found"
239
- Verify detector key exists in `model_loader._REGISTRY`
240
 
241
- ### Slow processing
242
- - Try faster detector: YOLOv8 (`hf_yolov8`)
243
- - Reduce video resolution before uploading
244
- - Use `max_frames` parameter for testing
245
 
246
- ## Dependencies
247
 
248
- Core packages:
249
- - `fastapi` + `uvicorn`: Web server
250
- - `torch` + `transformers`: Deep learning models
251
- - `opencv-python-headless`: Video processing
252
- - `ultralytics`: YOLOv8 implementation
253
- - `huggingface-hub`: Model downloading
254
- - `pillow`, `scipy`, `accelerate`, `timm`: Supporting libraries
 
4
 
5
  ## Project Overview
6
 
7
+ Reusable video analysis base combining object detection, segmentation, depth estimation, and multi-object tracking. Deployed as a Hugging Face Space (Docker SDK). Designed for multi-GPU inference with async job processing and live MJPEG streaming.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
 
9
  ## Development Commands
10
 
 
11
  ```bash
12
+ # Setup
13
+ python -m venv .venv && source .venv/bin/activate
14
  pip install -r requirements.txt
 
15
 
16
+ # Run dev server
 
 
17
  uvicorn app:app --host 0.0.0.0 --port 7860 --reload
18
 
19
+ # Docker (production / HF Spaces)
20
+ docker build -t detection_base . && docker run -p 7860:7860 detection_base
 
 
21
 
22
+ # Test async detection
23
+ curl -X POST http://localhost:7860/detect/async \
 
 
24
  -F "video=@sample.mp4" \
25
  -F "mode=object_detection" \
26
+ -F "queries=person,car" \
27
+ -F "detector=yolo11"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
  ```
29
 
30
+ No test suite exists. Verify changes by running the server and testing through the UI at `http://localhost:7860`.
31
 
32
+ ## Architecture
33
 
34
+ ### Request Flow
 
 
 
 
 
 
 
 
 
 
35
 
36
  ```
37
+ index.html → POST /detect/async → app.py
38
+ ├─ process_first_frame() # Fast preview (~1-2s)
39
+ ├─ Return job_id + URLs immediately
40
+ Background: process_video_async()
41
+ ├─ run_inference() # Detection mode
42
+ run_grounded_sam2_tracking() # Segmentation mode
 
 
 
 
 
 
 
 
 
 
43
  ```
44
 
45
+ The async pipeline returns instantly with a `job_id`. The frontend polls `/detect/status/{job_id}` and streams live frames via `/detect/stream/{job_id}` (MJPEG).
 
 
 
 
 
 
46
 
47
+ ### API Endpoints (app.py)
 
48
 
49
+ **Core:** `POST /detect` (sync), `POST /detect/async` (async with streaming)
50
+ **Job management:** `GET /detect/status/{job_id}`, `DELETE /detect/job/{job_id}`, `GET /detect/video/{job_id}`, `GET /detect/stream/{job_id}`
51
+ **Per-frame data:** `GET /detect/tracks/{job_id}/{frame_idx}`, `GET /detect/first-frame/{job_id}`, `GET /detect/first-frame-depth/{job_id}`, `GET /detect/depth-video/{job_id}`
52
+ **Benchmarking:** `POST /benchmark`, `POST /benchmark/profile`, `POST /benchmark/analysis`, `GET /gpu-monitor`, `GET /benchmark/hardware`
53
 
54
+ ### Model Registries
 
 
 
 
 
 
55
 
56
+ All models use a registry + factory pattern with `@lru_cache` for singleton loading. Use `load_*_on_device(name, device)` for multi-GPU (no cache).
 
 
 
57
 
58
+ **Detectors** (`models/model_loader.py`):
59
+ | Key | Model | Vocabulary |
60
+ |-----|-------|-----------|
61
+ | `yolo11` (default) | YOLO11m | COCO classes only |
62
+ | `detr_resnet50` | DETR | COCO classes only |
63
+ | `grounding_dino` | Grounding DINO | Open-vocabulary (arbitrary text) |
64
+ | `drone_yolo` | Drone YOLO | Specialized UAV detection |
65
 
66
+ All implement `ObjectDetector.predict(frame, queries)` `DetectionResult(boxes, scores, labels, label_names)` from `models/detectors/base.py`.
67
 
68
+ **Segmenters** (`models/segmenters/model_loader.py`):
69
+ - `GSAM2-S/B/L` Grounded SAM2 (small/base/large) backed by grounding_dino
70
+ - `YSAM2-S/B/L` YOLO-SAM2 (small/base/large) backed by yolo11
71
 
72
+ **Depth** (`models/depth_estimators/model_loader.py`):
73
+ - `depth` — DepthAnythingV2
 
 
 
 
74
 
75
+ ### Inference Pipeline (inference.py)
76
 
77
+ Three public entry points:
78
+ - **`process_first_frame()`** — Extract + detect on frame 0 only. Returns processed frame + detections.
79
+ - **`run_inference()`** — Full detection pipeline. Multi-GPU data parallelism with worker threads per GPU, reorder buffer for out-of-order completion, ByteTracker for object tracking, optional depth.
80
+ - **`run_grounded_sam2_tracking()`** — SAM2 segmentation with temporal coherence. Uses `SharedFrameStore` (in-memory decoded frames, 12 GiB budget) or falls back to JPEG extraction. `step` parameter controls keyframe interval.
81
 
82
+ ### Async Job System (jobs/)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
83
 
84
+ - **`jobs/models.py`** — `JobInfo` dataclass, `JobStatus` enum (PROCESSING/COMPLETED/FAILED/CANCELLED)
85
+ - **`jobs/storage.py`** — Thread-safe in-memory storage at `/tmp/detection_jobs/{job_id}/`. Auto-cleanup every 10 minutes.
86
+ - **`jobs/background.py`** — `process_video_async()` dispatches to the correct inference function, updates job status.
87
+ - **`jobs/streaming.py`** — Event-driven MJPEG frame publishing. Non-blocking (drops if consumer is slow). Frames pre-resized to 640px width.
88
 
89
+ ### Concurrency Model
 
 
 
90
 
91
+ - Per-model `RLock` for GPU serialization (`inference.py:_get_model_lock`)
92
+ - Multi-GPU workers use separate model instances per device
93
+ - `AsyncVideoReader` prefetches frames in a background thread to prevent GPU starvation
94
 
95
+ ### Frontend (index.html)
 
96
 
97
+ Single HTML page with vanilla JS. Upload video, pick mode/model, view first frame, live MJPEG stream, download processed video, inspect detection JSON.
 
98
 
99
+ ## Adding a New Detector
 
100
 
101
+ 1. Create class in `models/detectors/` implementing `ObjectDetector` from `base.py`
102
+ 2. Register in `models/model_loader.py` `_REGISTRY`
103
+ 3. Add option to detector dropdown in `index.html`
 
104
 
105
+ ## Dual Remotes
106
 
107
+ - `hf` → Hugging Face Space (deployment)
108
+ - `github` GitHub (version control)
 
 
 
 
 
coco_classes.py CHANGED
@@ -6,7 +6,6 @@ import logging
6
  import re
7
  from typing import Dict, Optional, Tuple
8
 
9
- import numpy as np
10
 
11
  logger = logging.getLogger(__name__)
12
 
@@ -157,69 +156,6 @@ _COCO_SYNONYMS: Dict[str, str] = {
157
  _ALIAS_LOOKUP: Dict[str, str] = {_normalize(alias): canonical for alias, canonical in _COCO_SYNONYMS.items()}
158
 
159
 
160
- # ---------------------------------------------------------------------------
161
- # Semantic similarity fallback (lazy-loaded)
162
- # ---------------------------------------------------------------------------
163
-
164
- _SEMANTIC_MODEL = None
165
- _COCO_EMBEDDINGS: Optional[np.ndarray] = None
166
- _SEMANTIC_THRESHOLD = 0.65 # Minimum cosine similarity to accept a match
167
-
168
-
169
- def _get_semantic_model():
170
- """Lazy-load a lightweight sentence-transformer for semantic matching."""
171
- global _SEMANTIC_MODEL, _COCO_EMBEDDINGS
172
- if _SEMANTIC_MODEL is not None:
173
- return _SEMANTIC_MODEL, _COCO_EMBEDDINGS
174
-
175
- try:
176
- from sentence_transformers import SentenceTransformer
177
- _SEMANTIC_MODEL = SentenceTransformer("all-MiniLM-L6-v2")
178
- # Prefix with "a photo of a" to anchor embeddings in visual/object space
179
- coco_phrases = [f"a photo of a {cls}" for cls in COCO_CLASSES]
180
- _COCO_EMBEDDINGS = _SEMANTIC_MODEL.encode(
181
- coco_phrases, normalize_embeddings=True
182
- )
183
- logger.info("Loaded semantic similarity model for COCO class mapping")
184
- except Exception:
185
- logger.warning("sentence-transformers unavailable; semantic COCO mapping disabled", exc_info=True)
186
- _SEMANTIC_MODEL = False # Sentinel: tried and failed
187
- _COCO_EMBEDDINGS = None
188
-
189
- return _SEMANTIC_MODEL, _COCO_EMBEDDINGS
190
-
191
-
192
- def _semantic_coco_match(value: str) -> Optional[str]:
193
- """Find the closest COCO class by embedding cosine similarity.
194
-
195
- Returns the COCO class name if similarity >= threshold, else None.
196
- """
197
- model, coco_embs = _get_semantic_model()
198
- if model is False or coco_embs is None:
199
- return None
200
-
201
- query_emb = model.encode(
202
- [f"a photo of a {value}"], normalize_embeddings=True
203
- )
204
- similarities = query_emb @ coco_embs.T # (1, 80)
205
- best_idx = int(np.argmax(similarities))
206
- best_score = float(similarities[0, best_idx])
207
-
208
- if best_score >= _SEMANTIC_THRESHOLD:
209
- matched = COCO_CLASSES[best_idx]
210
- logger.info(
211
- "Semantic COCO match: '%s' -> '%s' (score=%.3f)",
212
- value, matched, best_score,
213
- )
214
- return matched
215
-
216
- logger.debug(
217
- "Semantic COCO match failed: '%s' best='%s' (score=%.3f < %.2f)",
218
- value, COCO_CLASSES[best_idx], best_score, _SEMANTIC_THRESHOLD,
219
- )
220
- return None
221
-
222
-
223
  @functools.lru_cache(maxsize=512)
224
  def canonicalize_coco_name(value: str | None) -> str | None:
225
  """Map an arbitrary string to the closest COCO class name if possible.
@@ -230,7 +166,6 @@ def canonicalize_coco_name(value: str | None) -> str | None:
230
  3. Substring match (alias then canonical)
231
  4. Token-level match
232
  5. Fuzzy string match (difflib)
233
- 6. Semantic embedding similarity (sentence-transformers)
234
  """
235
 
236
  if not value:
@@ -261,5 +196,4 @@ def canonicalize_coco_name(value: str | None) -> str | None:
261
  if close:
262
  return _CANONICAL_LOOKUP[close[0]]
263
 
264
- # Last resort: semantic embedding similarity
265
- return _semantic_coco_match(value)
 
6
  import re
7
  from typing import Dict, Optional, Tuple
8
 
 
9
 
10
  logger = logging.getLogger(__name__)
11
 
 
156
  _ALIAS_LOOKUP: Dict[str, str] = {_normalize(alias): canonical for alias, canonical in _COCO_SYNONYMS.items()}
157
 
158
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
159
  @functools.lru_cache(maxsize=512)
160
  def canonicalize_coco_name(value: str | None) -> str | None:
161
  """Map an arbitrary string to the closest COCO class name if possible.
 
166
  3. Substring match (alias then canonical)
167
  4. Token-level match
168
  5. Fuzzy string match (difflib)
 
169
  """
170
 
171
  if not value:
 
196
  if close:
197
  return _CANONICAL_LOOKUP[close[0]]
198
 
199
+ return None
 
models/detectors/detr.py CHANGED
@@ -9,7 +9,7 @@ from models.detectors.base import DetectionResult, ObjectDetector
9
 
10
 
11
  class DetrDetector(ObjectDetector):
12
- """Wrapper around facebook/detr-resnet-50 for mission-aligned detection."""
13
 
14
  MODEL_NAME = "facebook/detr-resnet-50"
15
 
 
9
 
10
 
11
  class DetrDetector(ObjectDetector):
12
+ """Wrapper around facebook/detr-resnet-50 for object detection."""
13
 
14
  MODEL_NAME = "facebook/detr-resnet-50"
15
 
models/detectors/grounding_dino.py CHANGED
@@ -9,7 +9,7 @@ from models.detectors.base import DetectionResult, ObjectDetector
9
 
10
 
11
  class GroundingDinoDetector(ObjectDetector):
12
- """IDEA-Research Grounding DINO-B detector for open-vocabulary missions."""
13
 
14
  MODEL_NAME = "IDEA-Research/grounding-dino-base"
15
 
 
9
 
10
 
11
  class GroundingDinoDetector(ObjectDetector):
12
+ """IDEA-Research Grounding DINO-B detector for open-vocabulary detection."""
13
 
14
  MODEL_NAME = "IDEA-Research/grounding-dino-base"
15
 
models/segmenters/model_loader.py CHANGED
@@ -40,7 +40,7 @@ _REGISTRY: Dict[str, Callable[..., Segmenter]] = {
40
 
41
 
42
  def get_segmenter_detector(segmenter_name: str) -> str:
43
- """Return the detector key associated with a segmenter (for mission parsing)."""
44
  spec = _SEGMENTER_SPECS.get(segmenter_name)
45
  if spec is None:
46
  available = ", ".join(sorted(_REGISTRY))
 
40
 
41
 
42
  def get_segmenter_detector(segmenter_name: str) -> str:
43
+ """Return the detector key associated with a segmenter."""
44
  spec = _SEGMENTER_SPECS.get(segmenter_name)
45
  if spec is None:
46
  available = ", ".join(sorted(_REGISTRY))
utils/profiler.py CHANGED
@@ -390,7 +390,6 @@ def run_profiled_segmentation(
390
  queries,
391
  segmenter_name=segmenter_name,
392
  step=step,
393
- enable_gpt=False,
394
  max_frames=max_frames,
395
  _perf_metrics=metrics,
396
  _perf_lock=lock,
 
390
  queries,
391
  segmenter_name=segmenter_name,
392
  step=step,
 
393
  max_frames=max_frames,
394
  _perf_metrics=metrics,
395
  _perf_lock=lock,
utils/tracker.py CHANGED
@@ -3,7 +3,6 @@ import numpy as np
3
  from scipy.optimize import linear_sum_assignment
4
  import scipy.linalg
5
 
6
- from utils.schemas import AssessmentStatus
7
 
8
 
9
  class KalmanFilter:
@@ -198,24 +197,6 @@ class KalmanFilter:
198
  return ret
199
 
200
 
201
- # Default staleness threshold: GPT metadata older than this many frames is flagged STALE
202
- MAX_STALE_FRAMES = 300
203
-
204
- GPT_SYNC_KEYS = frozenset({
205
- # Legacy / polyfilled fields (consumed by frontend cards)
206
- "gpt_distance_m", "gpt_direction", "gpt_description", "gpt_raw",
207
- "threat_level_score", "distance_m", "direction", "description",
208
- # Universal schema fields
209
- "object_type", "size", "visible_weapons", "weapon_readiness",
210
- "motion_status", "range_estimate", "bearing",
211
- "threat_level", "threat_classification", "tactical_intent",
212
- "dynamic_features",
213
- # Provenance and temporal validity
214
- "assessment_frame_index", "assessment_status",
215
- # Mission relevance
216
- "mission_relevant", "relevance_reason",
217
- })
218
-
219
 
220
  class STrack:
221
  """
@@ -247,8 +228,8 @@ class STrack:
247
  self.mean = None
248
  self.covariance = None
249
 
250
- # GPT attributes (persistent)
251
- self.gpt_data = {}
252
 
253
  def _tlwh_from_xyxy(self, xyxy):
254
  """Convert xyxy to tlwh."""
@@ -566,29 +547,18 @@ class ByteTracker:
566
  d_out['bbox'] = [float(x) for x in tracked_bbox]
567
  d_out['track_id'] = f"T{str(track.track_id).zfill(2)}"
568
 
569
- # Restore GPT data if track has it and current detection didn't
570
- for k, v in track.gpt_data.items():
571
  if k not in d_out:
572
  d_out[k] = v
573
 
574
- # --- Temporal validity check (INV-5, INV-11) ---
575
- assessment_frame = d_out.get('assessment_frame_index')
576
- if assessment_frame is not None:
577
- frames_since = self.frame_id - assessment_frame
578
- if frames_since > MAX_STALE_FRAMES:
579
- d_out['assessment_status'] = AssessmentStatus.STALE
580
- d_out['assessment_age_frames'] = frames_since
581
- elif d_out.get('assessment_status') != AssessmentStatus.ASSESSED:
582
- # INV-6: Unassessed objects get explicit UNASSESSED status
583
- d_out['assessment_status'] = AssessmentStatus.UNASSESSED
584
-
585
  # Update history
586
- if 'history' not in track.gpt_data:
587
- track.gpt_data['history'] = []
588
- track.gpt_data['history'].append(d_out['bbox'])
589
- if len(track.gpt_data['history']) > 30:
590
- track.gpt_data['history'].pop(0)
591
- d_out['history'] = track.gpt_data['history']
592
 
593
  results.append(d_out)
594
 
@@ -606,42 +576,11 @@ class ByteTracker:
606
  return results
607
 
608
  def _sync_data(self, track, det_source):
609
- """Propagate attributes like GPT data between track and detection."""
610
- # 1. From Source to Track (Update)
611
  source_data = det_source.original_data if hasattr(det_source, 'original_data') else {}
612
- for k in GPT_SYNC_KEYS:
613
  if k in source_data:
614
- track.gpt_data[k] = source_data[k]
615
-
616
- # 2. From Track to Source (Forward fill logic handled in output construction)
617
-
618
- def inject_metadata(self, tracked_dets):
619
- """Push metadata from post-processed detection dicts back into internal STrack objects.
620
-
621
- Needed because GPT results are added to detection dicts *after* tracker.update()
622
- returns, so the tracker's internal state doesn't have GPT data unless we
623
- explicitly push it back in.
624
-
625
- Records assessment_frame_index for temporal validity tracking (INV-5).
626
- """
627
- meta_by_tid = {}
628
- for d in tracked_dets:
629
- tid = d.get('track_id')
630
- if not tid:
631
- continue
632
- meta = {k: d[k] for k in GPT_SYNC_KEYS if k in d}
633
- if meta:
634
- # Ensure assessment_frame_index is recorded
635
- if "assessment_frame_index" not in meta and any(
636
- k in meta for k in ("threat_level_score", "gpt_raw", "object_type")
637
- ):
638
- meta["assessment_frame_index"] = self.frame_id
639
- meta["assessment_status"] = AssessmentStatus.ASSESSED
640
- meta_by_tid[tid] = meta
641
- for track in self.tracked_stracks:
642
- tid_str = f"T{str(track.track_id).zfill(2)}"
643
- if tid_str in meta_by_tid:
644
- track.gpt_data.update(meta_by_tid[tid_str])
645
 
646
 
647
  # --- Helper Functions ---
 
3
  from scipy.optimize import linear_sum_assignment
4
  import scipy.linalg
5
 
 
6
 
7
 
8
  class KalmanFilter:
 
197
  return ret
198
 
199
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
200
 
201
  class STrack:
202
  """
 
228
  self.mean = None
229
  self.covariance = None
230
 
231
+ # Per-track metadata (persistent across frames)
232
+ self.metadata = {}
233
 
234
  def _tlwh_from_xyxy(self, xyxy):
235
  """Convert xyxy to tlwh."""
 
547
  d_out['bbox'] = [float(x) for x in tracked_bbox]
548
  d_out['track_id'] = f"T{str(track.track_id).zfill(2)}"
549
 
550
+ # Restore metadata if track has it and current detection didn't
551
+ for k, v in track.metadata.items():
552
  if k not in d_out:
553
  d_out[k] = v
554
 
 
 
 
 
 
 
 
 
 
 
 
555
  # Update history
556
+ if 'history' not in track.metadata:
557
+ track.metadata['history'] = []
558
+ track.metadata['history'].append(d_out['bbox'])
559
+ if len(track.metadata['history']) > 30:
560
+ track.metadata['history'].pop(0)
561
+ d_out['history'] = track.metadata['history']
562
 
563
  results.append(d_out)
564
 
 
576
  return results
577
 
578
  def _sync_data(self, track, det_source):
579
+ """Propagate metadata (e.g. depth) between detection and track."""
 
580
  source_data = det_source.original_data if hasattr(det_source, 'original_data') else {}
581
+ for k in ("depth_rel",):
582
  if k in source_data:
583
+ track.metadata[k] = source_data[k]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
584
 
585
 
586
  # --- Helper Functions ---