GR00T N1.6-3B SimplerEnv & BEHAVIOR-1K (ONNX)

ONNX-converted models of NVIDIA GR00T N1.6-3B for three embodiment configurations: Fractal (Google Robot), Bridge (WidowX), and BEHAVIOR-1K (R1 Pro). These models enable PyTorch-free inference via ONNX Runtime with CUDA execution provider.

Base PyTorch Models

Variant HuggingFace Source Embodiment Tag Embodiment ID
Fractal nvidia/GR00T-N1.6-fractal OXE_GOOGLE 0
Bridge nvidia/GR00T-N1.6-bridge OXE_WIDOWX 1
BEHAVIOR-1K nvidia/GR00T-N1.6-BEHAVIOR1k BEHAVIOR_R1_PRO 24

Model Architecture

The GR00T N1.6 model is split into two ONNX components:

  1. Eagle Backbone (eagle_backbone.onnx): Vision encoder (EagleX) that processes images into visual tokens
  2. Action Head (action_head.onnx): DiT-based diffusion transformer that generates action sequences

Key Dimensions

Parameter Value
SEQ_LEN 108
BACKBONE_DIM 2048
ACTION_DIM 128
ACTION_HORIZON 50
STATE_DIM 128

Directory Structure

onnx/
  fractal/
    eagle_backbone.onnx          # Vision backbone
    action_head.onnx             # Action head (DiT)
    <external_data_files>        # ~1200 weight tensor files
  bridge/
    action_head.onnx             # Bridge-specific action head
    <external_data_files>        # ~1000 weight tensor files
    # NOTE: Bridge uses the SAME eagle_backbone as Fractal.
    #       Download eagle_backbone.onnx + its external data from onnx/fractal/
  behavior1k/
    eagle_backbone.onnx          # Behavior1k backbone
    action_head.onnx             # Behavior1k action head
    <external_data_files>        # ~1100 weight tensor files
scripts/
  convert_to_onnx.py             # Conversion script (fractal/bridge)
  convert_behavior1k.py          # Conversion script (behavior1k)
  run_gr00t_onnx_server.py       # ZMQ ONNX inference server
  validate_onnx.py               # Cosine similarity validation
  run_simplerenv_onnx_eval.sh    # SimplerEnv evaluation (fractal + bridge)
  run_behavior1k_eval.sh         # BEHAVIOR-1K evaluation client
  run_behavior1k_thirdperson.py  # Third-person view rendering (Python)
  run_behavior1k_thirdperson.sh  # Third-person view rendering (launcher)

Bridge backbone note: The Bridge variant shares the same Eagle backbone as Fractal. When using Bridge, point to onnx/fractal/eagle_backbone.onnx or copy the fractal backbone files into the bridge directory.

ONNX Conversion Details

  • Opset version: 17
  • Precision: float32
  • Batch size: Static batch_size=1 (no dynamic axes)
  • Key modification: ScatterND eliminated via FixedEmbodimentLinear โ€” replaces the dynamic embodiment selection (nn.Embedding + ScatterND) with a fixed linear layer for the target embodiment, enabling static graph export
  • External data: Weights stored as external data files (ONNX size_threshold=0) due to model size exceeding 2GB protobuf limit

Conversion Command (Fractal/Bridge)

PYTHONPATH="" python scripts/convert_to_onnx.py \
    --model-path nvidia/GR00T-N1.6-fractal \
    --embodiment-tag OXE_GOOGLE \
    --output-dir ./onnx/fractal

Conversion Command (BEHAVIOR-1K)

PYTHONPATH="" python scripts/convert_behavior1k.py \
    --model-path nvidia/GR00T-N1.6-BEHAVIOR1k \
    --embodiment-tag BEHAVIOR_R1_PRO \
    --output-dir ./onnx/behavior1k

Inference Usage

ONNX Inference Server (ZMQ)

The inference server loads ONNX models and serves action predictions over ZMQ (port 5555):

# Fractal (Google Robot)
PYTHONPATH="" CUDA_VISIBLE_DEVICES=0 python scripts/run_gr00t_onnx_server.py \
    --model-path nvidia/GR00T-N1.6-fractal \
    --embodiment-tag OXE_GOOGLE \
    --onnx-dir ./onnx/fractal \
    --port 5555

# Bridge (WidowX) โ€” uses fractal backbone
PYTHONPATH="" CUDA_VISIBLE_DEVICES=0 python scripts/run_gr00t_onnx_server.py \
    --model-path nvidia/GR00T-N1.6-bridge \
    --embodiment-tag OXE_WIDOWX \
    --onnx-dir ./onnx/bridge \
    --backbone-onnx-dir ./onnx/fractal \
    --port 5555

# BEHAVIOR-1K
PYTHONPATH="" CUDA_VISIBLE_DEVICES=0 python scripts/run_gr00t_onnx_server.py \
    --model-path nvidia/GR00T-N1.6-BEHAVIOR1k \
    --embodiment-tag BEHAVIOR_R1_PRO \
    --onnx-dir ./onnx/behavior1k \
    --port 5555

SimplerEnv Evaluation Client

Connect a SimplerEnv evaluation script to the running ONNX server:

python simpler_env_eval.py \
    --task google_robot_pick_coke_can \
    --policy gr00t_zmq \
    --zmq-host localhost \
    --zmq-port 5555

Simulation Evaluation

SimplerEnv Evaluation (PyTorch + ONNX)

SimplerEnv evaluation runs as a two-process pipeline: a server (loads the model) and a client (runs the simulation).

Step 1: Start the inference server

PyTorch server (original GR00T pipeline):

# Fractal (Google Robot) on GPU 0, port 5556
PYTHONPATH="" CUDA_VISIBLE_DEVICES=0 python gr00t/eval/run_gr00t_server.py \
    --model_path nvidia/GR00T-N1.6-fractal \
    --embodiment_tag OXE_GOOGLE \
    --port 5556

# Bridge (WidowX) on GPU 0, port 5557
PYTHONPATH="" CUDA_VISIBLE_DEVICES=0 python gr00t/eval/run_gr00t_server.py \
    --model_path nvidia/GR00T-N1.6-bridge \
    --embodiment_tag OXE_WIDOWX \
    --port 5557

ONNX server (PyTorch-free, uses scripts/run_gr00t_onnx_server.py):

# Fractal on GPU 0, port 5556
PYTHONPATH="" CUDA_VISIBLE_DEVICES=0 python scripts/run_gr00t_onnx_server.py \
    --model-path nvidia/GR00T-N1.6-fractal \
    --embodiment-tag OXE_GOOGLE \
    --onnx-dir ./onnx/fractal \
    --port 5556

# Bridge on GPU 0, port 5557
PYTHONPATH="" CUDA_VISIBLE_DEVICES=0 python scripts/run_gr00t_onnx_server.py \
    --model-path nvidia/GR00T-N1.6-bridge \
    --embodiment-tag OXE_WIDOWX \
    --onnx-dir ./onnx/bridge \
    --backbone-onnx-dir ./onnx/fractal \
    --port 5557

Step 2: Run SimplerEnv evaluation

Once the server is ready, run the evaluation client:

bash scripts/run_simplerenv_onnx_eval.sh

This script runs all Fractal (Google Robot) tasks against port 5556 and collects results + videos. See the script for customizing tasks, episode counts, and output paths.

PYTHONPATH note: The SimplerEnv eval script explicitly sets PYTHONPATH="" and PYTHONNOUSERSITE=1 to avoid conflicts between the SimplerEnv virtualenv and the Isaac-GR00T environment. The SimplerEnv Python binary is invoked directly from its own .venv.


BEHAVIOR-1K Evaluation

BEHAVIOR-1K evaluation requires OmniGibson and the BEHAVIOR-1K environment installed separately (not bundled in this repo). The server and client run on different GPUs to avoid VRAM conflicts.

Prerequisites

Step 1: Start GR00T inference server (GPU 0)

PYTHONPATH="" CUDA_VISIBLE_DEVICES=0 python gr00t/eval/run_gr00t_server.py \
    --model_path nvidia/GR00T-N1.6-BEHAVIOR1k \
    --embodiment_tag BEHAVIOR_R1_PRO \
    --port 5555 \
    --use_sim_policy_wrapper

Step 2: Run BEHAVIOR-1K eval client (GPU 1)

bash scripts/run_behavior1k_eval.sh

The script runs 5 tasks ร— 5 episodes each against the server on port 5555.

Required environment variables (already set in the script):

export PYTHONPATH=/path/to/Isaac-GR00T
export PYTHONNOUSERSITE=1
export OMNI_KIT_ACCEPT_EULA=YES
export OMNIGIBSON_HEADLESS=1    # Run without display (headless)

Tasks evaluated:

  • turning_on_radio
  • picking_up_trash
  • picking_up_toys
  • putting_shoes_on_rack
  • clean_up_your_desk

Note: OmniGibson may produce a segfault on cleanup โ€” this is expected behavior and does not affect results. The script does not use set -e for this reason.


BEHAVIOR-1K Third-Person View Rendering

For visualization, a third-person camera can be added that tracks the robot and produces a high-resolution 1280ร—720 video using Path Tracing with the OptiX denoiser.

Quick start (automated launcher)

bash scripts/run_behavior1k_thirdperson.sh

This script automatically:

  1. Starts the GR00T inference server on GPU 0 (port 5555)
  2. Waits for server readiness
  3. Runs one episode of picking_up_trash with third-person rendering on GPU 1
  4. Outputs videos to /path/to/gr00t_behavior1k_thirdperson_videos/

Manual usage

# Terminal 1 โ€” Start GR00T inference server (GPU 0)
PYTHONPATH="" CUDA_VISIBLE_DEVICES=0 python gr00t/eval/run_gr00t_server.py \
    --model_path nvidia/GR00T-N1.6-BEHAVIOR1k \
    --embodiment_tag BEHAVIOR_R1_PRO \
    --port 5555 \
    --use_sim_policy_wrapper

# Terminal 2 โ€” Run third-person rendering client (GPU 1)
PYTHONPATH=/path/to/Isaac-GR00T \
PYTHONNOUSERSITE=1 \
OMNIGIBSON_HEADLESS=1 \
OMNI_KIT_ACCEPT_EULA=YES \
CUDA_VISIBLE_DEVICES=1 \
python scripts/run_behavior1k_thirdperson.py \
    --task picking_up_trash \
    --n_episodes 1 \
    --policy_client_host 127.0.0.1 \
    --policy_client_port 5555 \
    --output_dir ./thirdperson_videos \
    --max_episode_steps 720 \
    --n_action_steps 8 \
    --n_render_iterations 20 \
    --spp 1

Key renderer settings (noise-free rendering on H100)

The run_behavior1k_thirdperson.py script configures the RTX renderer to eliminate noise artifacts:

Setting Value Why
Render mode Path Tracing (mode=2) Higher quality than RTX Real-Time
optixDenoiser/enabled True Enable OptiX denoiser
optixDenoiser/blendFactor 0.0 Critical: default is 1.0 (= no denoising!)
optixDenoiser/temporalMode/enabled True Prevents flickering between video frames
pathtracing/clampSpp 0 Critical: default clamps SPP to 32!
pathtracing/spp 1 (fast) or 128 (quality) Samples per pixel per render call
n_render_iterations 20 (fast) or 5 (with high SPP) Extra render passes per frame
adaptiveSampling/enabled True Concentrates samples on noisy regions
fireflyFilter/enabled True Clamps bright noise speckles

Recommended settings for H100: --spp 1 --n_render_iterations 20 (fast, low memory) or --spp 128 --n_render_iterations 5 (higher quality, slower). The default blendFactor=1.0 bug means denoising is silently disabled unless explicitly set to 0.0.

Output

Two video streams are recorded per episode:

  • Third-person view (thirdperson/thirdperson_<uuid>_s{0,1}.mp4): 1280ร—720, follows robot with configurable offset
  • Robot view (robotview/): Standard wrist/head camera view from observation dict

The filename suffix _s1.mp4 indicates a successful episode; _s0.mp4 indicates failure.

Validation Results

Cosine similarity between PyTorch and ONNX outputs (higher is better, 1.0 = identical):

Model Backbone Cosine Sim Action Head Cosine Sim
Fractal 1.000000 1.000000
Bridge 1.000000 1.000000
BEHAVIOR-1K 1.000000 1.000000

SimplerEnv Evaluation Results

Fractal (Google Robot)

Task PyTorch ONNX
pick_coke_can 100% 100%
pick_object 100% 100%
move_near 100% 100%
open_drawer partial 0%
close_drawer 50% 20%
place_in_closed_drawer 0% 0%

Bridge (WidowX)

Task PyTorch ONNX
spoon_on_towel 100% 0%
carrot_on_plate 100% 0%
put_eggplant_in_basket 100% 0%
stack_cube 0% 0%
put_eggplant_in_sink 72.7% 0%
close_drawer 100% 20%
open_drawer 100% 0%

BEHAVIOR-1K

Task PyTorch
turning_on_radio 0%
picking_up_trash 0%
picking_up_toys 0%
putting_shoes_on_rack 0%
clean_up_your_desk 0%

Note: BEHAVIOR-1K tasks show 0% for both PyTorch and ONNX, indicating these tasks are inherently challenging for the base model in the SimplerEnv simulation environment.

Requirements

  • onnxruntime-gpu>=1.17.0
  • numpy
  • pyzmq (for inference server)
  • Pillow (for image processing)
  • CUDA-capable GPU with sufficient VRAM (~8GB)

License

This repository contains ONNX-converted models derived from NVIDIA's GR00T N1.6 under the Apache 2.0 License.

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading