--- license: apache-2.0 datasets: - Vincent2025hello/usim language: - en base_model: - nvidia/GR00T-N1.5-3B tags: - robotics - vla - underwater robot --- # U0 Final — Underwater Robot VLA Model **Model ID:** `Vincent2025hello/u0_final` **Base Model:** [nvidia/GR00T-N1.5-3B](https://huggingface.co/nvidia/GR00T-N1.5-3B) **License:** Apache 2.0 **Paper:** [USIM and U0: A Vision-Language-Action Dataset and Model for General Underwater Robots](https://arxiv.org/abs/2510.07869) --- ## Model Description This model is a **Vision-Language-Action (VLA)** policy fine-tuned from NVIDIA GR00T N1.5 (3B parameters) for the **U0 underwater robot** (based on BlueROV2). It takes dual-camera visual observations and multi-sensor state inputs, and outputs 16-step action trajectories for autonomous underwater tasks. ## Fine-Tuning Details | Item | Value | |------|-------| | Base Model | GR00T-N1.5-3B | | Fine-Tuning Method | Full Fine-Tuning (with visual tuning) | | Action Horizon | 16 steps | | Denoising Steps | 4 (inference) | | Embodiment Tag | `new_embodiment` | | Data Config | `u0_bot` | ## Input / Output ### Inputs - **Video** (dual camera): ego-view + wrist-view images (224×224) - **State** (29-dim): - `joint_pos` (6): joint positions - `pwm` (8): thruster PWM values - `joint_v` (5): joint velocities - `dvl_v` (3): DVL velocity - `imu_av` (3): IMU angular velocity - `imu_la` (3): IMU linear acceleration - `pressure` (1): depth pressure - `dvl_h` (1): DVL altitude - **Language**: natural language task description ### Outputs - **Action** (13-dim × 16 steps): - `joint_pos` (6): target joint positions - `pwm` (8): target thruster PWM values ### Download Model ```bash pip install huggingface_hub hf download Vincent2025hello/u0_final --local-dir ./u0_final ``` ## Training Code The complete fine-tuning and evaluation framework is available at: [https://github.com/VincentGu2000/u0](https://github.com/VincentGu2000/u0) ## Citation ```bibtex @misc{gu2025usimu0visionlanguageactiondataset, title={USIM and U0: A Vision-Language-Action Dataset and Model for General Underwater Robots}, author={Junwen Gu and Zhiheng Wu and Pengxuan Si and Shuang Qiu and Yukai Feng and Luoyang Sun and Laien Luo and Lianyi Yu and Jian Wang and Zhengxing Wu}, year={2025}, eprint={2510.07869}, archivePrefix={arXiv}, primaryClass={cs.RO}, url={https://arxiv.org/abs/2510.07869}, } ``` ## Acknowledgments This model is fine-tuned from [NVIDIA GR00T N1.5](https://huggingface.co/nvidia/GR00T-N1.5-3B). We thank the NVIDIA GEAR team for open-sourcing the GR00T model and framework.