ganatrask commited on
Commit
50b1265
·
verified ·
1 Parent(s): c021948

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +209 -0
README.md ADDED
@@ -0,0 +1,209 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: nvidia-open-model-license
4
+ license_link: https://developer.nvidia.com/open-model-license
5
+ language:
6
+ - en
7
+ library_name: transformers
8
+ tags:
9
+ - robotics
10
+ - vision-language-action
11
+ - manipulation
12
+ - gr00t
13
+ - nvidia
14
+ - physical-ai
15
+ - humanoid
16
+ - reachy2
17
+ - lerobot
18
+ datasets:
19
+ - ganatrask/NOVA
20
+ base_model:
21
+ - nvidia/GR00T-N1.6-3B
22
+ pipeline_tag: robotics
23
+ ---
24
+
25
+ # NOVA Model - GR00T N1.6 Fine-tuned for Reachy 2
26
+
27
+ <p align="center">
28
+ <img src="https://img.shields.io/badge/NVIDIA-GR00T%20N1.6-76B900?style=for-the-badge&logo=nvidia" alt="GR00T N1.6"/>
29
+ <img src="https://img.shields.io/badge/Robot-Reachy%202-0066CC?style=for-the-badge" alt="Reachy 2"/>
30
+ <img src="https://img.shields.io/badge/Task-Pick%20%26%20Place-green?style=for-the-badge" alt="Pick & Place"/>
31
+ </p>
32
+
33
+ **NOVA** (Neural Open Vision Actions) is a fine-tuned version of NVIDIA's GR00T N1.6 vision-language-action model, trained specifically for [Pollen Robotics' Reachy 2](https://www.pollen-robotics.com/reachy/) humanoid robot.
34
+
35
+ ## Model Description
36
+
37
+ This model is part of an end-to-end Physical AI pipeline that combines:
38
+ - **Voice Input**: Parakeet CTC 0.6B for speech-to-text
39
+ - **Scene Reasoning**: Cosmos Reason 2 for object detection and spatial understanding
40
+ - **Action Policy**: This fine-tuned GR00T N1.6 model for manipulation
41
+
42
+ ### Model Details
43
+
44
+ | Property | Value |
45
+ |----------|-------|
46
+ | **Base Model** | [nvidia/GR00T-N1.6-3B](https://huggingface.co/nvidia/GR00T-N1.6-3B) |
47
+ | **Parameters** | ~3B |
48
+ | **Embodiment** | Reachy 2 (custom embodiment tag) |
49
+ | **Action Space** | 8-DOF (7 arm joints + gripper) |
50
+ | **Training Steps** | 30,000 |
51
+ | **Final Loss** | ~0.008-0.01 |
52
+
53
+ ### Action Space
54
+
55
+ ```python
56
+ action = [
57
+ shoulder_pitch, # -180° to 90°
58
+ shoulder_roll, # -180° to 10°
59
+ elbow_yaw, # -90° to 90°
60
+ elbow_pitch, # -125° to 0°
61
+ wrist_roll, # -100° to 100°
62
+ wrist_pitch, # -45° to 45°
63
+ wrist_yaw, # -30° to 30°
64
+ gripper, # 0 (closed) to 1 (open)
65
+ ]
66
+ ```
67
+
68
+ ## Intended Use
69
+
70
+ This model is designed for:
71
+ - **Pick-and-place manipulation** tasks on Reachy 2 robot
72
+ - **Language-conditioned control** ("Pick up the red cube")
73
+ - **Research** in vision-language-action models and robotic manipulation
74
+
75
+ ### Supported Tasks
76
+
77
+ - Pick up objects (cube, cylinder, capsule, rectangular box)
78
+ - Place objects in target locations
79
+ - Handle 8 color variations (red, green, blue, yellow, cyan, magenta, orange, purple)
80
+
81
+ ## Training
82
+
83
+ ### Training Data
84
+
85
+ Trained on the [ganatrask/NOVA dataset](https://huggingface.co/datasets/ganatrask/NOVA):
86
+ - **100 episodes** of expert demonstrations
87
+ - **32 task variations** (4 objects × 8 colors)
88
+ - Domain randomization (position, lighting, camera jitter)
89
+ - LeRobot v2.1 format
90
+
91
+ ### Training Configuration
92
+
93
+ | Parameter | Value |
94
+ |-----------|-------|
95
+ | GPU | NVIDIA A100-SXM4-80GB |
96
+ | GPUs | 2 |
97
+ | Batch Size | 64 |
98
+ | Max Steps | 30,000 |
99
+ | Save Steps | 3,000 |
100
+ | Video Backend | decord |
101
+
102
+ ### Training Command
103
+
104
+ ```bash
105
+ python -m gr00t.train \
106
+ --dataset_repo_id ganatrask/NOVA \
107
+ --embodiment_tag reachy2 \
108
+ --video_backend decord \
109
+ --num_gpus 2 \
110
+ --batch_size 64 \
111
+ --max_steps 30000 \
112
+ --save_steps 3000 \
113
+ --output_dir ./checkpoints/groot-reachy2
114
+ ```
115
+
116
+ ## Usage
117
+
118
+ ### Prerequisites
119
+
120
+ You need to apply a patch to Isaac-GR00T to add the Reachy 2 embodiment tag:
121
+
122
+ ```bash
123
+ cd Isaac-GR00T
124
+ patch -p1 < ../patches/add_reachy2_embodiment.patch
125
+ ```
126
+
127
+ ### Inference
128
+
129
+ ```python
130
+ from gr00t.data.embodiment_tags import EmbodimentTag
131
+ from gr00t.policy.gr00t_policy import Gr00tPolicy
132
+ import importlib.util
133
+
134
+ # Load modality config first
135
+ spec = importlib.util.spec_from_file_location(
136
+ "modality_config",
137
+ "configs/reachy2_modality_config.py"
138
+ )
139
+ module = importlib.util.module_from_spec(spec)
140
+ spec.loader.exec_module(module)
141
+
142
+ # Load policy
143
+ policy = Gr00tPolicy(
144
+ embodiment_tag=EmbodimentTag.REACHY2,
145
+ model_path="ganatrask/NOVA", # or local checkpoint path
146
+ device="cuda",
147
+ strict=True,
148
+ )
149
+
150
+ # Run inference
151
+ obs = {
152
+ "video": {"front_cam": image[None, None, :, :, :]}, # (1, 1, H, W, 3)
153
+ "state": {"arm_joints": joints[None, None, :]}, # (1, 1, 7)
154
+ "language": {"annotation.human.task_description": [["Pick up the red cube"]]},
155
+ }
156
+ action, _ = policy.get_action(obs)
157
+ ```
158
+
159
+ ## Performance
160
+
161
+ | Metric | Value |
162
+ |--------|-------|
163
+ | Inference Speed | ~40ms/step (A100) |
164
+ | VRAM Usage | ~44GB / 80GB |
165
+ | Training Time | ~6 hours (30K steps) |
166
+
167
+ ## Limitations
168
+
169
+ - **Simulation-trained**: Primarily trained on MuJoCo simulation data
170
+ - **Single-arm**: Currently supports right arm manipulation only
171
+ - **Fixed camera setup**: Expects front camera input at 224×224 resolution
172
+ - **Task scope**: Optimized for pick-and-place; may not generalize to other manipulation tasks
173
+
174
+ ## Ethical Considerations
175
+
176
+ - This model should be used for research purposes
177
+ - Human supervision recommended for real robot deployment
178
+ - Not intended for safety-critical applications without extensive testing
179
+
180
+ ## Citation
181
+
182
+ If you use this model, please cite:
183
+
184
+ ```bibtex
185
+ @misc{nova2025,
186
+ title={NOVA: Neural Open Vision Actions},
187
+ author={ganatrask},
188
+ year={2025},
189
+ publisher={HuggingFace},
190
+ url={https://huggingface.co/ganatrask/NOVA}
191
+ }
192
+ ```
193
+
194
+ ## Acknowledgments
195
+
196
+ - **[NVIDIA](https://developer.nvidia.com/)** - GR00T N1.6 base model
197
+ - **[Pollen Robotics](https://www.pollen-robotics.com/)** - Reachy 2 robot
198
+ - **[HuggingFace](https://huggingface.co/)** - LeRobot framework
199
+ - **[VESSL AI](https://vessl.ai/)** - GPU compute for training
200
+
201
+ ## License
202
+
203
+ This model inherits the [NVIDIA Open Model License](https://developer.nvidia.com/open-model-license) from the base GR00T N1.6 model.
204
+
205
+ ## Links
206
+
207
+ - **GitHub**: [ganatrask/NOVA](https://github.com/ganatrask/NOVA)
208
+ - **Dataset**: [ganatrask/NOVA](https://huggingface.co/datasets/ganatrask/NOVA)
209
+ - **Base Model**: [nvidia/GR00T-N1.6-3B](https://huggingface.co/nvidia/GR00T-N1.6-3B)