ganatrask
/

NOVA

+---
+license: other
+license_name: nvidia-open-model-license
+license_link: https://developer.nvidia.com/open-model-license
+language:
+  - en
+library_name: transformers
+tags:
+  - robotics
+  - vision-language-action
+  - manipulation
+  - gr00t
+  - nvidia
+  - physical-ai
+  - humanoid
+  - reachy2
+  - lerobot
+datasets:
+  - ganatrask/NOVA
+base_model:
+  - nvidia/GR00T-N1.6-3B
+pipeline_tag: robotics
+---
+# NOVA Model - GR00T N1.6 Fine-tuned for Reachy 2
+<p align="center">
+  <img src="https://img.shields.io/badge/NVIDIA-GR00T%20N1.6-76B900?style=for-the-badge&logo=nvidia" alt="GR00T N1.6"/>
+  <img src="https://img.shields.io/badge/Robot-Reachy%202-0066CC?style=for-the-badge" alt="Reachy 2"/>
+  <img src="https://img.shields.io/badge/Task-Pick%20%26%20Place-green?style=for-the-badge" alt="Pick & Place"/>
+</p>
+**NOVA** (Neural Open Vision Actions) is a fine-tuned version of NVIDIA's GR00T N1.6 vision-language-action model, trained specifically for [Pollen Robotics' Reachy 2](https://www.pollen-robotics.com/reachy/) humanoid robot.
+## Model Description
+This model is part of an end-to-end Physical AI pipeline that combines:
+- **Voice Input**: Parakeet CTC 0.6B for speech-to-text
+- **Scene Reasoning**: Cosmos Reason 2 for object detection and spatial understanding
+- **Action Policy**: This fine-tuned GR00T N1.6 model for manipulation
+### Model Details
+| Property | Value |
+|----------|-------|
+| **Base Model** | [nvidia/GR00T-N1.6-3B](https://huggingface.co/nvidia/GR00T-N1.6-3B) |
+| **Parameters** | ~3B |
+| **Embodiment** | Reachy 2 (custom embodiment tag) |
+| **Action Space** | 8-DOF (7 arm joints + gripper) |
+| **Training Steps** | 30,000 |
+| **Final Loss** | ~0.008-0.01 |
+### Action Space
+```python
+action = [
+    shoulder_pitch,  # -180° to 90°
+    shoulder_roll,   # -180° to 10°
+    elbow_yaw,       # -90° to 90°
+    elbow_pitch,     # -125° to 0°
+    wrist_roll,      # -100° to 100°
+    wrist_pitch,     # -45° to 45°
+    wrist_yaw,       # -30° to 30°
+    gripper,         # 0 (closed) to 1 (open)
+]
+```
+## Intended Use
+This model is designed for:
+- **Pick-and-place manipulation** tasks on Reachy 2 robot
+- **Language-conditioned control** ("Pick up the red cube")
+- **Research** in vision-language-action models and robotic manipulation
+### Supported Tasks
+- Pick up objects (cube, cylinder, capsule, rectangular box)
+- Place objects in target locations
+- Handle 8 color variations (red, green, blue, yellow, cyan, magenta, orange, purple)
+## Training
+### Training Data
+Trained on the [ganatrask/NOVA dataset](https://huggingface.co/datasets/ganatrask/NOVA):
+- **100 episodes** of expert demonstrations
+- **32 task variations** (4 objects × 8 colors)
+- Domain randomization (position, lighting, camera jitter)
+- LeRobot v2.1 format
+### Training Configuration
+| Parameter | Value |
+|-----------|-------|
+| GPU | NVIDIA A100-SXM4-80GB |
+| GPUs | 2 |
+| Batch Size | 64 |
+| Max Steps | 30,000 |
+| Save Steps | 3,000 |
+| Video Backend | decord |
+### Training Command
+```bash
+python -m gr00t.train \
+    --dataset_repo_id ganatrask/NOVA \
+    --embodiment_tag reachy2 \
+    --video_backend decord \
+    --num_gpus 2 \
+    --batch_size 64 \
+    --max_steps 30000 \
+    --save_steps 3000 \
+    --output_dir ./checkpoints/groot-reachy2
+```
+## Usage
+### Prerequisites
+You need to apply a patch to Isaac-GR00T to add the Reachy 2 embodiment tag:
+```bash
+cd Isaac-GR00T
+patch -p1 < ../patches/add_reachy2_embodiment.patch
+```
+### Inference
+```python
+from gr00t.data.embodiment_tags import EmbodimentTag
+from gr00t.policy.gr00t_policy import Gr00tPolicy
+import importlib.util
+# Load modality config first
+spec = importlib.util.spec_from_file_location(
+    "modality_config",
+    "configs/reachy2_modality_config.py"
+)
+module = importlib.util.module_from_spec(spec)
+spec.loader.exec_module(module)
+# Load policy
+policy = Gr00tPolicy(
+    embodiment_tag=EmbodimentTag.REACHY2,
+    model_path="ganatrask/NOVA",  # or local checkpoint path
+    device="cuda",
+    strict=True,
+)
+# Run inference
+obs = {
+    "video": {"front_cam": image[None, None, :, :, :]},  # (1, 1, H, W, 3)
+    "state": {"arm_joints": joints[None, None, :]},      # (1, 1, 7)
+    "language": {"annotation.human.task_description": [["Pick up the red cube"]]},
+}
+action, _ = policy.get_action(obs)
+```
+## Performance
+| Metric | Value |
+|--------|-------|
+| Inference Speed | ~40ms/step (A100) |
+| VRAM Usage | ~44GB / 80GB |
+| Training Time | ~6 hours (30K steps) |
+## Limitations
+- **Simulation-trained**: Primarily trained on MuJoCo simulation data
+- **Single-arm**: Currently supports right arm manipulation only
+- **Fixed camera setup**: Expects front camera input at 224×224 resolution
+- **Task scope**: Optimized for pick-and-place; may not generalize to other manipulation tasks
+## Ethical Considerations
+- This model should be used for research purposes
+- Human supervision recommended for real robot deployment
+- Not intended for safety-critical applications without extensive testing
+## Citation
+If you use this model, please cite:
+```bibtex
+@misc{nova2025,
+  title={NOVA: Neural Open Vision Actions},
+  author={ganatrask},
+  year={2025},
+  publisher={HuggingFace},
+  url={https://huggingface.co/ganatrask/NOVA}
+}
+```
+## Acknowledgments
+- **[NVIDIA](https://developer.nvidia.com/)** - GR00T N1.6 base model
+- **[Pollen Robotics](https://www.pollen-robotics.com/)** - Reachy 2 robot
+- **[HuggingFace](https://huggingface.co/)** - LeRobot framework
+- **[VESSL AI](https://vessl.ai/)** - GPU compute for training
+## License
+This model inherits the [NVIDIA Open Model License](https://developer.nvidia.com/open-model-license) from the base GR00T N1.6 model.
+## Links
+- **GitHub**: [ganatrask/NOVA](https://github.com/ganatrask/NOVA)
+- **Dataset**: [ganatrask/NOVA](https://huggingface.co/datasets/ganatrask/NOVA)
+- **Base Model**: [nvidia/GR00T-N1.6-3B](https://huggingface.co/nvidia/GR00T-N1.6-3B)