File size: 1,907 Bytes
0242eda
7a75b36
 
1e8aed2
 
7a75b36
 
 
 
 
0242eda
7a75b36
1e8aed2
7a75b36
1e8aed2
7a75b36
1e8aed2
7a75b36
1e8aed2
 
 
 
 
 
 
 
 
7a75b36
 
 
1e8aed2
7a75b36
 
 
1e8aed2
7a75b36
 
 
 
 
1e8aed2
7a75b36
1e8aed2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
---
language:
- en
license: cc-by-4.0
pipeline_tag: robotics
tags:
- embodied-ai
- aerial-vision-language-navigation
- world-model
- model-weights
---

# WorldVLN: Autoregressive World Action Model for Aerial Vision-Language Navigation

This repository contains the model weights for WorldVLN, the first autoregressive world action model for aerial vision-language navigation (VLN).

[**Paper**](https://huggingface.co/papers/2605.15964) | [**Project Page**](https://embodiedcity.github.io/WorldVLN/) | [**Code**](https://github.com/EmbodiedCity/WorldVLN.code)

WorldVLN formulates aerial navigation as a prediction-driven world-action problem. It adapts a latent autoregressive video backbone to predict short-horizon world-state transitions and decodes them directly into executable waypoint actions. After each action segment is executed, newly received observations are encoded back into the autoregressive context, enabling closed-loop world-action prediction.

## Model Weights
This repository includes the weights for:
- The world model backbone.
- The action decoder.

## Usage
For detailed instructions on installation, setup, and inference (including the autoregressive I/O protocol), please refer to the [official GitHub repository](https://github.com/EmbodiedCity/WorldVLN.code).

## Citation

If this work is useful for your research, please cite:

```bibtex
@misc{zhao2026worldvln,
      title={WorldVLN: Autoregressive World Action Model for Aerial Vision-Language Navigation}, 
      author={Baining Zhao and Jiacheng Xu and Weicheng Feng and Xin Zhang and Zhaolu Wang and Haoyang Wang and Shilong Ji and Ziyou Wang and Jianjie Fang and Zhiheng Zheng and Weichen Zhang and Yu Shang and Wei Wu and Chen Gao and Xinlei Chen and Yong Li},
      year={2026},
      eprint={2605.15964},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2605.15964}, 
}
```