Add model card and metadata for CosFly-Track
Browse filesHi, I'm Niels from the Hugging Face community science team. This PR improves the model card for the CosFly-Track project.
It adds:
- The `robotics` pipeline tag to the metadata.
- The `transformers` library name to enable code snippets.
- A link to the associated research paper: [CosFly-Track: A Large-Scale Multi-Modal Dataset for UAV Visual Tracking via Multi-Constraint Trajectory Optimization](https://huggingface.co/papers/2605.17776).
- A description of the dataset and the fine-tuned models used for UAV visual tracking.
README.md
CHANGED
|
@@ -1,7 +1,43 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
|
|
|
|
|
|
| 3 |
---
|
| 4 |
|
|
|
|
| 5 |
|
|
|
|
| 6 |
|
|
|
|
| 7 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
+
library_name: transformers
|
| 4 |
+
pipeline_tag: robotics
|
| 5 |
---
|
| 6 |
|
| 7 |
+
# CosFly-Track
|
| 8 |
|
| 9 |
+
This repository contains model checkpoints associated with the paper [CosFly-Track: A Large-Scale Multi-Modal Dataset for UAV Visual Tracking via Multi-Constraint Trajectory Optimization](https://huggingface.co/papers/2605.17776).
|
| 10 |
|
| 11 |
+
## Overview
|
| 12 |
|
| 13 |
+
CosFly-Track is a large-scale multi-modal dataset and scalable generation pipeline designed for UAV (Unmanned Aerial Vehicle) visual tracking in urban environments. While most aerial vision-language navigation (VLN) datasets focus on static goals, CosFly-Track addresses the problem of continuously following a moving target while maintaining visibility and avoiding collisions.
|
| 14 |
+
|
| 15 |
+
### Dataset Details
|
| 16 |
+
- **Scale**: Approximately 12,000 expert and perturbed UAV trajectories.
|
| 17 |
+
- **Volume**: 2.4 million timesteps (approximately 334 hours).
|
| 18 |
+
- **Aligned Channels**: Seven aligned data channels including RGB, metric depth, semantic segmentation, six-degree-of-freedom (6-DoF) drone pose, target state with visibility flag, bilingual (Chinese-English) instructions, and trajectory-pair metadata.
|
| 19 |
+
|
| 20 |
+
## Methodology
|
| 21 |
+
|
| 22 |
+
The project introduces **MuCO** (Multi-Constraint Optimizer), a planner that plans directly in continuous 3D space. It jointly enforces target visibility, viewpoint quality, collision avoidance, smoothness, and kinematic feasibility, avoiding the artifacts of grid-based planners.
|
| 23 |
+
|
| 24 |
+
## Model Description
|
| 25 |
+
|
| 26 |
+
The checkpoints in this repository include various vision-language models (VLMs) fine-tuned on the CosFly-Track dataset to act as dynamic target-following agents. Evaluated architectures include:
|
| 27 |
+
- Qwen (Qwen3-VL, Qwen3.5)
|
| 28 |
+
- InternVL
|
| 29 |
+
- GLM-4V
|
| 30 |
+
- Gemma 4
|
| 31 |
+
|
| 32 |
+
Fine-tuning on CosFly-Track significantly improves tracking performance (SR@1 meter) compared to zero-shot baselines, supporting the use of this dataset for training robust autonomous agents.
|
| 33 |
+
|
| 34 |
+
## Citation
|
| 35 |
+
|
| 36 |
+
```bibtex
|
| 37 |
+
@article{cosflytrack2026,
|
| 38 |
+
title={CosFly-Track: A Large-Scale Multi-Modal Dataset for UAV Visual Tracking via Multi-Constraint Trajectory Optimization},
|
| 39 |
+
author={Anonymous Authors},
|
| 40 |
+
journal={arXiv preprint arXiv:2605.17776},
|
| 41 |
+
year={2026}
|
| 42 |
+
}
|
| 43 |
+
```
|