Vincent2025hello
/

u0_final

underwater robot

Model card Files Files and versions

Vincent2025hello commited on 7 days ago

Commit

7ea87e2

·

verified ·

1 Parent(s): 8fbac60

Upload README.md

Files changed (1) hide show

README.md +74 -3

README.md CHANGED Viewed

@@ -1,3 +1,74 @@
----
-license: apache-2.0
----

+# U0 Final — Underwater Robot VLA Model
+**Model ID:** `Vincent2025hello/u0_final`
+**Base Model:** [nvidia/GR00T-N1.5-3B](https://huggingface.co/nvidia/GR00T-N1.5-3B)
+**License:** Apache 2.0
+**Paper:** [USIM and U0: A Vision-Language-Action Dataset and Model for General Underwater Robots](https://arxiv.org/abs/2510.07869)
+---
+## Model Description
+This model is a **Vision-Language-Action (VLA)** policy fine-tuned from NVIDIA GR00T N1.5 (3B parameters) for the **U0 underwater robot** (based on BlueROV2). It takes dual-camera visual observations and multi-sensor state inputs, and outputs 16-step action trajectories for autonomous underwater tasks.
+## Fine-Tuning Details
+| Item | Value |
+|------|-------|
+| Base Model | GR00T-N1.5-3B |
+| Fine-Tuning Method | Full Fine-Tuning (with visual tuning) |
+| Action Horizon | 16 steps |
+| Denoising Steps | 4 (inference) |
+| Embodiment Tag | `new_embodiment` |
+| Data Config | `u0_bot` |
+## Input / Output
+### Inputs
+- **Video** (dual camera): ego-view + wrist-view images (224×224)
+- **State** (29-dim):
+  - `joint_pos` (6): joint positions
+  - `pwm` (8): thruster PWM values
+  - `joint_v` (5): joint velocities
+  - `dvl_v` (3): DVL velocity
+  - `imu_av` (3): IMU angular velocity
+  - `imu_la` (3): IMU linear acceleration
+  - `pressure` (1): depth pressure
+  - `dvl_h` (1): DVL altitude
+- **Language**: natural language task description
+### Outputs
+- **Action** (13-dim × 16 steps):
+  - `joint_pos` (6): target joint positions
+  - `pwm` (8): target thruster PWM values
+### Download Model
+```bash
+pip install huggingface_hub
+hf download Vincent2025hello/u0_final --local-dir ./u0_final
+```
+## Training Code
+The complete fine-tuning and evaluation framework is available at: [https://github.com/VincentGu2000/u0](https://github.com/VincentGu2000/u0)
+## Citation
+```bibtex
+@misc{gu2025usimu0visionlanguageactiondataset,
+      title={USIM and U0: A Vision-Language-Action Dataset and Model for General Underwater Robots},
+      author={Junwen Gu and Zhiheng Wu and Pengxuan Si and Shuang Qiu and Yukai Feng and Luoyang Sun and Laien Luo and Lianyi Yu and Jian Wang and Zhengxing Wu},
+      year={2025},
+      eprint={2510.07869},
+      archivePrefix={arXiv},
+      primaryClass={cs.RO},
+      url={https://arxiv.org/abs/2510.07869},
+}
+```
+## Acknowledgments
+This model is fine-tuned from [NVIDIA GR00T N1.5](https://huggingface.co/nvidia/GR00T-N1.5-3B). We thank the NVIDIA GEAR team for open-sourcing the GR00T model and framework.