metadata
license: apache-2.0
datasets:
- Vincent2025hello/usim
language:
- en
base_model:
- nvidia/GR00T-N1.5-3B
tags:
- robotics
- vla
- underwater robot
U0 Final — Underwater Robot VLA Model
Model ID: Vincent2025hello/u0_final
Base Model: nvidia/GR00T-N1.5-3B
License: Apache 2.0
Paper: USIM and U0: A Vision-Language-Action Dataset and Model for General Underwater Robots
Model Description
This model is a Vision-Language-Action (VLA) policy fine-tuned from NVIDIA GR00T N1.5 (3B parameters) for the U0 underwater robot (based on BlueROV2). It takes dual-camera visual observations and multi-sensor state inputs, and outputs 16-step action trajectories for autonomous underwater tasks.
Fine-Tuning Details
| Item | Value |
|---|---|
| Base Model | GR00T-N1.5-3B |
| Fine-Tuning Method | Full Fine-Tuning (with visual tuning) |
| Action Horizon | 16 steps |
| Denoising Steps | 4 (inference) |
| Embodiment Tag | new_embodiment |
| Data Config | u0_bot |
Input / Output
Inputs
- Video (dual camera): ego-view + wrist-view images (224×224)
- State (29-dim):
joint_pos(6): joint positionspwm(8): thruster PWM valuesjoint_v(5): joint velocitiesdvl_v(3): DVL velocityimu_av(3): IMU angular velocityimu_la(3): IMU linear accelerationpressure(1): depth pressuredvl_h(1): DVL altitude
- Language: natural language task description
Outputs
- Action (13-dim × 16 steps):
joint_pos(6): target joint positionspwm(8): target thruster PWM values
Download Model
pip install huggingface_hub
hf download Vincent2025hello/u0_final --local-dir ./u0_final
Training Code
The complete fine-tuning and evaluation framework is available at: https://github.com/VincentGu2000/u0
Citation
@misc{gu2025usimu0visionlanguageactiondataset,
title={USIM and U0: A Vision-Language-Action Dataset and Model for General Underwater Robots},
author={Junwen Gu and Zhiheng Wu and Pengxuan Si and Shuang Qiu and Yukai Feng and Luoyang Sun and Laien Luo and Lianyi Yu and Jian Wang and Zhengxing Wu},
year={2025},
eprint={2510.07869},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2510.07869},
}
Acknowledgments
This model is fine-tuned from NVIDIA GR00T N1.5. We thank the NVIDIA GEAR team for open-sourcing the GR00T model and framework.