Safetensors

Add model card and metadata for CosFly-Track

#86
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +36 -0
README.md CHANGED
@@ -1,7 +1,43 @@
1
  ---
2
  license: apache-2.0
 
 
3
  ---
4
 
 
5
 
 
6
 
 
7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ library_name: transformers
4
+ pipeline_tag: robotics
5
  ---
6
 
7
+ # CosFly-Track
8
 
9
+ This repository contains model checkpoints associated with the paper [CosFly-Track: A Large-Scale Multi-Modal Dataset for UAV Visual Tracking via Multi-Constraint Trajectory Optimization](https://huggingface.co/papers/2605.17776).
10
 
11
+ ## Overview
12
 
13
+ CosFly-Track is a large-scale multi-modal dataset and scalable generation pipeline designed for UAV (Unmanned Aerial Vehicle) visual tracking in urban environments. While most aerial vision-language navigation (VLN) datasets focus on static goals, CosFly-Track addresses the problem of continuously following a moving target while maintaining visibility and avoiding collisions.
14
+
15
+ ### Dataset Details
16
+ - **Scale**: Approximately 12,000 expert and perturbed UAV trajectories.
17
+ - **Volume**: 2.4 million timesteps (approximately 334 hours).
18
+ - **Aligned Channels**: Seven aligned data channels including RGB, metric depth, semantic segmentation, six-degree-of-freedom (6-DoF) drone pose, target state with visibility flag, bilingual (Chinese-English) instructions, and trajectory-pair metadata.
19
+
20
+ ## Methodology
21
+
22
+ The project introduces **MuCO** (Multi-Constraint Optimizer), a planner that plans directly in continuous 3D space. It jointly enforces target visibility, viewpoint quality, collision avoidance, smoothness, and kinematic feasibility, avoiding the artifacts of grid-based planners.
23
+
24
+ ## Model Description
25
+
26
+ The checkpoints in this repository include various vision-language models (VLMs) fine-tuned on the CosFly-Track dataset to act as dynamic target-following agents. Evaluated architectures include:
27
+ - Qwen (Qwen3-VL, Qwen3.5)
28
+ - InternVL
29
+ - GLM-4V
30
+ - Gemma 4
31
+
32
+ Fine-tuning on CosFly-Track significantly improves tracking performance (SR@1 meter) compared to zero-shot baselines, supporting the use of this dataset for training robust autonomous agents.
33
+
34
+ ## Citation
35
+
36
+ ```bibtex
37
+ @article{cosflytrack2026,
38
+ title={CosFly-Track: A Large-Scale Multi-Modal Dataset for UAV Visual Tracking via Multi-Constraint Trajectory Optimization},
39
+ author={Anonymous Authors},
40
+ journal={arXiv preprint arXiv:2605.17776},
41
+ year={2026}
42
+ }
43
+ ```