GR00T N1.6-3B SimplerEnv & BEHAVIOR-1K (ONNX)
ONNX-converted models of NVIDIA GR00T N1.6-3B for three embodiment configurations: Fractal (Google Robot), Bridge (WidowX), and BEHAVIOR-1K (R1 Pro). These models enable PyTorch-free inference via ONNX Runtime with CUDA execution provider.
Base PyTorch Models
| Variant | HuggingFace Source | Embodiment Tag | Embodiment ID |
|---|---|---|---|
| Fractal | nvidia/GR00T-N1.6-fractal | OXE_GOOGLE |
0 |
| Bridge | nvidia/GR00T-N1.6-bridge | OXE_WIDOWX |
1 |
| BEHAVIOR-1K | nvidia/GR00T-N1.6-BEHAVIOR1k | BEHAVIOR_R1_PRO |
24 |
Model Architecture
The GR00T N1.6 model is split into two ONNX components:
- Eagle Backbone (
eagle_backbone.onnx): Vision encoder (EagleX) that processes images into visual tokens - Action Head (
action_head.onnx): DiT-based diffusion transformer that generates action sequences
Key Dimensions
| Parameter | Value |
|---|---|
| SEQ_LEN | 108 |
| BACKBONE_DIM | 2048 |
| ACTION_DIM | 128 |
| ACTION_HORIZON | 50 |
| STATE_DIM | 128 |
Directory Structure
onnx/
fractal/
eagle_backbone.onnx # Vision backbone
action_head.onnx # Action head (DiT)
<external_data_files> # ~1200 weight tensor files
bridge/
action_head.onnx # Bridge-specific action head
<external_data_files> # ~1000 weight tensor files
# NOTE: Bridge uses the SAME eagle_backbone as Fractal.
# Download eagle_backbone.onnx + its external data from onnx/fractal/
behavior1k/
eagle_backbone.onnx # Behavior1k backbone
action_head.onnx # Behavior1k action head
<external_data_files> # ~1100 weight tensor files
scripts/
convert_to_onnx.py # Conversion script (fractal/bridge)
convert_behavior1k.py # Conversion script (behavior1k)
run_gr00t_onnx_server.py # ZMQ ONNX inference server
validate_onnx.py # Cosine similarity validation
run_simplerenv_onnx_eval.sh # SimplerEnv evaluation (fractal + bridge)
run_behavior1k_eval.sh # BEHAVIOR-1K evaluation client
run_behavior1k_thirdperson.py # Third-person view rendering (Python)
run_behavior1k_thirdperson.sh # Third-person view rendering (launcher)
Bridge backbone note: The Bridge variant shares the same Eagle backbone as Fractal. When using Bridge, point to
onnx/fractal/eagle_backbone.onnxor copy the fractal backbone files into the bridge directory.
ONNX Conversion Details
- Opset version: 17
- Precision: float32
- Batch size: Static
batch_size=1(no dynamic axes) - Key modification:
ScatterNDeliminated viaFixedEmbodimentLinearโ replaces the dynamic embodiment selection (nn.Embedding+ScatterND) with a fixed linear layer for the target embodiment, enabling static graph export - External data: Weights stored as external data files (ONNX
size_threshold=0) due to model size exceeding 2GB protobuf limit
Conversion Command (Fractal/Bridge)
PYTHONPATH="" python scripts/convert_to_onnx.py \
--model-path nvidia/GR00T-N1.6-fractal \
--embodiment-tag OXE_GOOGLE \
--output-dir ./onnx/fractal
Conversion Command (BEHAVIOR-1K)
PYTHONPATH="" python scripts/convert_behavior1k.py \
--model-path nvidia/GR00T-N1.6-BEHAVIOR1k \
--embodiment-tag BEHAVIOR_R1_PRO \
--output-dir ./onnx/behavior1k
Inference Usage
ONNX Inference Server (ZMQ)
The inference server loads ONNX models and serves action predictions over ZMQ (port 5555):
# Fractal (Google Robot)
PYTHONPATH="" CUDA_VISIBLE_DEVICES=0 python scripts/run_gr00t_onnx_server.py \
--model-path nvidia/GR00T-N1.6-fractal \
--embodiment-tag OXE_GOOGLE \
--onnx-dir ./onnx/fractal \
--port 5555
# Bridge (WidowX) โ uses fractal backbone
PYTHONPATH="" CUDA_VISIBLE_DEVICES=0 python scripts/run_gr00t_onnx_server.py \
--model-path nvidia/GR00T-N1.6-bridge \
--embodiment-tag OXE_WIDOWX \
--onnx-dir ./onnx/bridge \
--backbone-onnx-dir ./onnx/fractal \
--port 5555
# BEHAVIOR-1K
PYTHONPATH="" CUDA_VISIBLE_DEVICES=0 python scripts/run_gr00t_onnx_server.py \
--model-path nvidia/GR00T-N1.6-BEHAVIOR1k \
--embodiment-tag BEHAVIOR_R1_PRO \
--onnx-dir ./onnx/behavior1k \
--port 5555
SimplerEnv Evaluation Client
Connect a SimplerEnv evaluation script to the running ONNX server:
python simpler_env_eval.py \
--task google_robot_pick_coke_can \
--policy gr00t_zmq \
--zmq-host localhost \
--zmq-port 5555
Simulation Evaluation
SimplerEnv Evaluation (PyTorch + ONNX)
SimplerEnv evaluation runs as a two-process pipeline: a server (loads the model) and a client (runs the simulation).
Step 1: Start the inference server
PyTorch server (original GR00T pipeline):
# Fractal (Google Robot) on GPU 0, port 5556
PYTHONPATH="" CUDA_VISIBLE_DEVICES=0 python gr00t/eval/run_gr00t_server.py \
--model_path nvidia/GR00T-N1.6-fractal \
--embodiment_tag OXE_GOOGLE \
--port 5556
# Bridge (WidowX) on GPU 0, port 5557
PYTHONPATH="" CUDA_VISIBLE_DEVICES=0 python gr00t/eval/run_gr00t_server.py \
--model_path nvidia/GR00T-N1.6-bridge \
--embodiment_tag OXE_WIDOWX \
--port 5557
ONNX server (PyTorch-free, uses scripts/run_gr00t_onnx_server.py):
# Fractal on GPU 0, port 5556
PYTHONPATH="" CUDA_VISIBLE_DEVICES=0 python scripts/run_gr00t_onnx_server.py \
--model-path nvidia/GR00T-N1.6-fractal \
--embodiment-tag OXE_GOOGLE \
--onnx-dir ./onnx/fractal \
--port 5556
# Bridge on GPU 0, port 5557
PYTHONPATH="" CUDA_VISIBLE_DEVICES=0 python scripts/run_gr00t_onnx_server.py \
--model-path nvidia/GR00T-N1.6-bridge \
--embodiment-tag OXE_WIDOWX \
--onnx-dir ./onnx/bridge \
--backbone-onnx-dir ./onnx/fractal \
--port 5557
Step 2: Run SimplerEnv evaluation
Once the server is ready, run the evaluation client:
bash scripts/run_simplerenv_onnx_eval.sh
This script runs all Fractal (Google Robot) tasks against port 5556 and collects results + videos. See the script for customizing tasks, episode counts, and output paths.
PYTHONPATH note: The SimplerEnv eval script explicitly sets
PYTHONPATH=""andPYTHONNOUSERSITE=1to avoid conflicts between the SimplerEnv virtualenv and the Isaac-GR00T environment. The SimplerEnv Python binary is invoked directly from its own.venv.
BEHAVIOR-1K Evaluation
BEHAVIOR-1K evaluation requires OmniGibson and the BEHAVIOR-1K environment installed separately (not bundled in this repo). The server and client run on different GPUs to avoid VRAM conflicts.
Prerequisites
- Isaac-GR00T installed with its virtualenv
- BEHAVIOR-1K / OmniGibson installed with its own virtualenv
Step 1: Start GR00T inference server (GPU 0)
PYTHONPATH="" CUDA_VISIBLE_DEVICES=0 python gr00t/eval/run_gr00t_server.py \
--model_path nvidia/GR00T-N1.6-BEHAVIOR1k \
--embodiment_tag BEHAVIOR_R1_PRO \
--port 5555 \
--use_sim_policy_wrapper
Step 2: Run BEHAVIOR-1K eval client (GPU 1)
bash scripts/run_behavior1k_eval.sh
The script runs 5 tasks ร 5 episodes each against the server on port 5555.
Required environment variables (already set in the script):
export PYTHONPATH=/path/to/Isaac-GR00T
export PYTHONNOUSERSITE=1
export OMNI_KIT_ACCEPT_EULA=YES
export OMNIGIBSON_HEADLESS=1 # Run without display (headless)
Tasks evaluated:
turning_on_radiopicking_up_trashpicking_up_toysputting_shoes_on_rackclean_up_your_desk
Note: OmniGibson may produce a segfault on cleanup โ this is expected behavior and does not affect results. The script does not use
set -efor this reason.
BEHAVIOR-1K Third-Person View Rendering
For visualization, a third-person camera can be added that tracks the robot and produces a high-resolution 1280ร720 video using Path Tracing with the OptiX denoiser.
Quick start (automated launcher)
bash scripts/run_behavior1k_thirdperson.sh
This script automatically:
- Starts the GR00T inference server on GPU 0 (port 5555)
- Waits for server readiness
- Runs one episode of
picking_up_trashwith third-person rendering on GPU 1 - Outputs videos to
/path/to/gr00t_behavior1k_thirdperson_videos/
Manual usage
# Terminal 1 โ Start GR00T inference server (GPU 0)
PYTHONPATH="" CUDA_VISIBLE_DEVICES=0 python gr00t/eval/run_gr00t_server.py \
--model_path nvidia/GR00T-N1.6-BEHAVIOR1k \
--embodiment_tag BEHAVIOR_R1_PRO \
--port 5555 \
--use_sim_policy_wrapper
# Terminal 2 โ Run third-person rendering client (GPU 1)
PYTHONPATH=/path/to/Isaac-GR00T \
PYTHONNOUSERSITE=1 \
OMNIGIBSON_HEADLESS=1 \
OMNI_KIT_ACCEPT_EULA=YES \
CUDA_VISIBLE_DEVICES=1 \
python scripts/run_behavior1k_thirdperson.py \
--task picking_up_trash \
--n_episodes 1 \
--policy_client_host 127.0.0.1 \
--policy_client_port 5555 \
--output_dir ./thirdperson_videos \
--max_episode_steps 720 \
--n_action_steps 8 \
--n_render_iterations 20 \
--spp 1
Key renderer settings (noise-free rendering on H100)
The run_behavior1k_thirdperson.py script configures the RTX renderer to eliminate noise artifacts:
| Setting | Value | Why |
|---|---|---|
| Render mode | Path Tracing (mode=2) | Higher quality than RTX Real-Time |
optixDenoiser/enabled |
True |
Enable OptiX denoiser |
optixDenoiser/blendFactor |
0.0 |
Critical: default is 1.0 (= no denoising!) |
optixDenoiser/temporalMode/enabled |
True |
Prevents flickering between video frames |
pathtracing/clampSpp |
0 |
Critical: default clamps SPP to 32! |
pathtracing/spp |
1 (fast) or 128 (quality) |
Samples per pixel per render call |
n_render_iterations |
20 (fast) or 5 (with high SPP) |
Extra render passes per frame |
adaptiveSampling/enabled |
True |
Concentrates samples on noisy regions |
fireflyFilter/enabled |
True |
Clamps bright noise speckles |
Recommended settings for H100:
--spp 1 --n_render_iterations 20(fast, low memory) or--spp 128 --n_render_iterations 5(higher quality, slower). The defaultblendFactor=1.0bug means denoising is silently disabled unless explicitly set to0.0.
Output
Two video streams are recorded per episode:
- Third-person view (
thirdperson/thirdperson_<uuid>_s{0,1}.mp4): 1280ร720, follows robot with configurable offset - Robot view (
robotview/): Standard wrist/head camera view from observation dict
The filename suffix _s1.mp4 indicates a successful episode; _s0.mp4 indicates failure.
Validation Results
Cosine similarity between PyTorch and ONNX outputs (higher is better, 1.0 = identical):
| Model | Backbone Cosine Sim | Action Head Cosine Sim |
|---|---|---|
| Fractal | 1.000000 | 1.000000 |
| Bridge | 1.000000 | 1.000000 |
| BEHAVIOR-1K | 1.000000 | 1.000000 |
SimplerEnv Evaluation Results
Fractal (Google Robot)
| Task | PyTorch | ONNX |
|---|---|---|
| pick_coke_can | 100% | 100% |
| pick_object | 100% | 100% |
| move_near | 100% | 100% |
| open_drawer | partial | 0% |
| close_drawer | 50% | 20% |
| place_in_closed_drawer | 0% | 0% |
Bridge (WidowX)
| Task | PyTorch | ONNX |
|---|---|---|
| spoon_on_towel | 100% | 0% |
| carrot_on_plate | 100% | 0% |
| put_eggplant_in_basket | 100% | 0% |
| stack_cube | 0% | 0% |
| put_eggplant_in_sink | 72.7% | 0% |
| close_drawer | 100% | 20% |
| open_drawer | 100% | 0% |
BEHAVIOR-1K
| Task | PyTorch |
|---|---|
| turning_on_radio | 0% |
| picking_up_trash | 0% |
| picking_up_toys | 0% |
| putting_shoes_on_rack | 0% |
| clean_up_your_desk | 0% |
Note: BEHAVIOR-1K tasks show 0% for both PyTorch and ONNX, indicating these tasks are inherently challenging for the base model in the SimplerEnv simulation environment.
Requirements
onnxruntime-gpu>=1.17.0numpypyzmq(for inference server)Pillow(for image processing)- CUDA-capable GPU with sufficient VRAM (~8GB)
License
This repository contains ONNX-converted models derived from NVIDIA's GR00T N1.6 under the Apache 2.0 License.