Vincent2025hello commited on
Commit
7ea87e2
·
verified ·
1 Parent(s): 8fbac60

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +74 -3
README.md CHANGED
@@ -1,3 +1,74 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # U0 Final — Underwater Robot VLA Model
2
+
3
+ **Model ID:** `Vincent2025hello/u0_final`
4
+ **Base Model:** [nvidia/GR00T-N1.5-3B](https://huggingface.co/nvidia/GR00T-N1.5-3B)
5
+ **License:** Apache 2.0
6
+ **Paper:** [USIM and U0: A Vision-Language-Action Dataset and Model for General Underwater Robots](https://arxiv.org/abs/2510.07869)
7
+
8
+ ---
9
+
10
+ ## Model Description
11
+
12
+ This model is a **Vision-Language-Action (VLA)** policy fine-tuned from NVIDIA GR00T N1.5 (3B parameters) for the **U0 underwater robot** (based on BlueROV2). It takes dual-camera visual observations and multi-sensor state inputs, and outputs 16-step action trajectories for autonomous underwater tasks.
13
+
14
+ ## Fine-Tuning Details
15
+
16
+ | Item | Value |
17
+ |------|-------|
18
+ | Base Model | GR00T-N1.5-3B |
19
+ | Fine-Tuning Method | Full Fine-Tuning (with visual tuning) |
20
+ | Action Horizon | 16 steps |
21
+ | Denoising Steps | 4 (inference) |
22
+ | Embodiment Tag | `new_embodiment` |
23
+ | Data Config | `u0_bot` |
24
+
25
+ ## Input / Output
26
+
27
+ ### Inputs
28
+
29
+ - **Video** (dual camera): ego-view + wrist-view images (224×224)
30
+ - **State** (29-dim):
31
+ - `joint_pos` (6): joint positions
32
+ - `pwm` (8): thruster PWM values
33
+ - `joint_v` (5): joint velocities
34
+ - `dvl_v` (3): DVL velocity
35
+ - `imu_av` (3): IMU angular velocity
36
+ - `imu_la` (3): IMU linear acceleration
37
+ - `pressure` (1): depth pressure
38
+ - `dvl_h` (1): DVL altitude
39
+ - **Language**: natural language task description
40
+
41
+ ### Outputs
42
+
43
+ - **Action** (13-dim × 16 steps):
44
+ - `joint_pos` (6): target joint positions
45
+ - `pwm` (8): target thruster PWM values
46
+
47
+ ### Download Model
48
+
49
+ ```bash
50
+ pip install huggingface_hub
51
+ hf download Vincent2025hello/u0_final --local-dir ./u0_final
52
+ ```
53
+
54
+ ## Training Code
55
+
56
+ The complete fine-tuning and evaluation framework is available at: [https://github.com/VincentGu2000/u0](https://github.com/VincentGu2000/u0)
57
+
58
+ ## Citation
59
+
60
+ ```bibtex
61
+ @misc{gu2025usimu0visionlanguageactiondataset,
62
+ title={USIM and U0: A Vision-Language-Action Dataset and Model for General Underwater Robots},
63
+ author={Junwen Gu and Zhiheng Wu and Pengxuan Si and Shuang Qiu and Yukai Feng and Luoyang Sun and Laien Luo and Lianyi Yu and Jian Wang and Zhengxing Wu},
64
+ year={2025},
65
+ eprint={2510.07869},
66
+ archivePrefix={arXiv},
67
+ primaryClass={cs.RO},
68
+ url={https://arxiv.org/abs/2510.07869},
69
+ }
70
+ ```
71
+
72
+ ## Acknowledgments
73
+
74
+ This model is fine-tuned from [NVIDIA GR00T N1.5](https://huggingface.co/nvidia/GR00T-N1.5-3B). We thank the NVIDIA GEAR team for open-sourcing the GR00T model and framework.