| --- |
| license: apache-2.0 |
| library_name: transformers |
| pipeline_tag: robotics |
| --- |
| |
| # CosFly-Track |
|
|
| This repository contains model checkpoints associated with the paper [CosFly-Track: A Large-Scale Multi-Modal Dataset for UAV Visual Tracking via Multi-Constraint Trajectory Optimization](https://huggingface.co/papers/2605.17776). |
|
|
| ## Overview |
|
|
| CosFly-Track is a large-scale multi-modal dataset and scalable generation pipeline designed for UAV (Unmanned Aerial Vehicle) visual tracking in urban environments. While most aerial vision-language navigation (VLN) datasets focus on static goals, CosFly-Track addresses the problem of continuously following a moving target while maintaining visibility and avoiding collisions. |
|
|
| ### Dataset Details |
| - **Scale**: Approximately 12,000 expert and perturbed UAV trajectories. |
| - **Volume**: 2.4 million timesteps (approximately 334 hours). |
| - **Aligned Channels**: Seven aligned data channels including RGB, metric depth, semantic segmentation, six-degree-of-freedom (6-DoF) drone pose, target state with visibility flag, bilingual (Chinese-English) instructions, and trajectory-pair metadata. |
|
|
| ## Methodology |
|
|
| The project introduces **MuCO** (Multi-Constraint Optimizer), a planner that plans directly in continuous 3D space. It jointly enforces target visibility, viewpoint quality, collision avoidance, smoothness, and kinematic feasibility, avoiding the artifacts of grid-based planners. |
|
|
| ## Model Description |
|
|
| The checkpoints in this repository include various vision-language models (VLMs) fine-tuned on the CosFly-Track dataset to act as dynamic target-following agents. Evaluated architectures include: |
| - Qwen (Qwen3-VL, Qwen3.5) |
| - InternVL |
| - GLM-4V |
| - Gemma 4 |
|
|
| Fine-tuning on CosFly-Track significantly improves tracking performance (SR@1 meter) compared to zero-shot baselines, supporting the use of this dataset for training robust autonomous agents. |
|
|
| ## Citation |
|
|
| ```bibtex |
| @article{cosflytrack2026, |
| title={CosFly-Track: A Large-Scale Multi-Modal Dataset for UAV Visual Tracking via Multi-Constraint Trajectory Optimization}, |
| author={Anonymous Authors}, |
| journal={arXiv preprint arXiv:2605.17776}, |
| year={2026} |
| } |
| ``` |