Vincent2025hello
/

u0_final

underwater robot

Model card Files Files and versions

u0_final / README.md

Vincent2025hello's picture

Vincent2025hello

Update README.md

c0c2a84 verified 7 days ago

|

history blame contribute delete

2.66 kB

	---
	license: apache-2.0
	datasets:
	- Vincent2025hello/usim
	language:
	- en
	base_model:
	- nvidia/GR00T-N1.5-3B
	tags:
	- robotics
	- vla
	- underwater robot
	---
	# U0 Final — Underwater Robot VLA Model

	Model ID: `Vincent2025hello/u0_final`
	Base Model: [nvidia/GR00T-N1.5-3B](https://huggingface.co/nvidia/GR00T-N1.5-3B)
	License: Apache 2.0
	Paper: [USIM and U0: A Vision-Language-Action Dataset and Model for General Underwater Robots](https://arxiv.org/abs/2510.07869)

	---

	## Model Description

	This model is a Vision-Language-Action (VLA) policy fine-tuned from NVIDIA GR00T N1.5 (3B parameters) for the U0 underwater robot (based on BlueROV2). It takes dual-camera visual observations and multi-sensor state inputs, and outputs 16-step action trajectories for autonomous underwater tasks.

	## Fine-Tuning Details

	\| Item \| Value \|
	\|------\|-------\|
	\| Base Model \| GR00T-N1.5-3B \|
	\| Fine-Tuning Method \| Full Fine-Tuning (with visual tuning) \|
	\| Action Horizon \| 16 steps \|
	\| Denoising Steps \| 4 (inference) \|
	\| Embodiment Tag \| `new_embodiment` \|
	\| Data Config \| `u0_bot` \|

	## Input / Output

	### Inputs

	- Video (dual camera): ego-view + wrist-view images (224×224)
	- State (29-dim):
	- `joint_pos` (6): joint positions
	- `pwm` (8): thruster PWM values
	- `joint_v` (5): joint velocities
	- `dvl_v` (3): DVL velocity
	- `imu_av` (3): IMU angular velocity
	- `imu_la` (3): IMU linear acceleration
	- `pressure` (1): depth pressure
	- `dvl_h` (1): DVL altitude
	- Language: natural language task description

	### Outputs

	- Action (13-dim × 16 steps):
	- `joint_pos` (6): target joint positions
	- `pwm` (8): target thruster PWM values

	### Download Model

	```bash
	pip install huggingface_hub
	hf download Vincent2025hello/u0_final --local-dir ./u0_final
	```

	## Training Code

	The complete fine-tuning and evaluation framework is available at: [https://github.com/VincentGu2000/u0](https://github.com/VincentGu2000/u0)

	## Citation

	```bibtex
	@misc{gu2025usimu0visionlanguageactiondataset,
	title={USIM and U0: A Vision-Language-Action Dataset and Model for General Underwater Robots},
	author={Junwen Gu and Zhiheng Wu and Pengxuan Si and Shuang Qiu and Yukai Feng and Luoyang Sun and Laien Luo and Lianyi Yu and Jian Wang and Zhengxing Wu},
	year={2025},
	eprint={2510.07869},
	archivePrefix={arXiv},
	primaryClass={cs.RO},
	url={https://arxiv.org/abs/2510.07869},
	}
	```

	## Acknowledgments

	This model is fine-tuned from [NVIDIA GR00T N1.5](https://huggingface.co/nvidia/GR00T-N1.5-3B). We thank the NVIDIA GEAR team for open-sourcing the GR00T model and framework.