GR00T N1.6-3B SimplerEnv & BEHAVIOR-1K (ONNX)

ONNX-converted models of NVIDIA GR00T N1.6-3B for three embodiment configurations: Fractal (Google Robot), Bridge (WidowX), and BEHAVIOR-1K (R1 Pro). These models enable PyTorch-free inference via ONNX Runtime with CUDA execution provider.

Base PyTorch Models

Variant	HuggingFace Source	Embodiment Tag	Embodiment ID
Fractal	nvidia/GR00T-N1.6-fractal	`OXE_GOOGLE`	0
Bridge	nvidia/GR00T-N1.6-bridge	`OXE_WIDOWX`	1
BEHAVIOR-1K	nvidia/GR00T-N1.6-BEHAVIOR1k	`BEHAVIOR_R1_PRO`	24

Model Architecture

The GR00T N1.6 model is split into two ONNX components:

Eagle Backbone (eagle_backbone.onnx): Vision encoder (EagleX) that processes images into visual tokens
Action Head (action_head.onnx): DiT-based diffusion transformer that generates action sequences

Key Dimensions

Parameter	Value
SEQ_LEN	108
BACKBONE_DIM	2048
ACTION_DIM	128
ACTION_HORIZON	50
STATE_DIM	128

Directory Structure

onnx/
  fractal/
    eagle_backbone.onnx          # Vision backbone
    action_head.onnx             # Action head (DiT)
    <external_data_files>        # ~1200 weight tensor files
  bridge/
    action_head.onnx             # Bridge-specific action head
    <external_data_files>        # ~1000 weight tensor files
    # NOTE: Bridge uses the SAME eagle_backbone as Fractal.
    #       Download eagle_backbone.onnx + its external data from onnx/fractal/
  behavior1k/
    eagle_backbone.onnx          # Behavior1k backbone
    action_head.onnx             # Behavior1k action head
    <external_data_files>        # ~1100 weight tensor files
scripts/
  convert_to_onnx.py             # Conversion script (fractal/bridge)
  convert_behavior1k.py          # Conversion script (behavior1k)
  run_gr00t_onnx_server.py       # ZMQ ONNX inference server
  validate_onnx.py               # Cosine similarity validation
  run_simplerenv_onnx_eval.sh    # SimplerEnv evaluation (fractal + bridge)
  run_behavior1k_eval.sh         # BEHAVIOR-1K evaluation client
  run_behavior1k_thirdperson.py  # Third-person view rendering (Python)
  run_behavior1k_thirdperson.sh  # Third-person view rendering (launcher)

Bridge backbone note: The Bridge variant shares the same Eagle backbone as Fractal. When using Bridge, point to onnx/fractal/eagle_backbone.onnx or copy the fractal backbone files into the bridge directory.

ONNX Conversion Details

Opset version: 17
Precision: float32
Batch size: Static batch_size=1 (no dynamic axes)
Key modification: ScatterND eliminated via FixedEmbodimentLinear — replaces the dynamic embodiment selection (nn.Embedding + ScatterND) with a fixed linear layer for the target embodiment, enabling static graph export
External data: Weights stored as external data files (ONNX size_threshold=0) due to model size exceeding 2GB protobuf limit

Conversion Command (Fractal/Bridge)

PYTHONPATH="" python scripts/convert_to_onnx.py \
    --model-path nvidia/GR00T-N1.6-fractal \
    --embodiment-tag OXE_GOOGLE \
    --output-dir ./onnx/fractal

Conversion Command (BEHAVIOR-1K)

PYTHONPATH="" python scripts/convert_behavior1k.py \
    --model-path nvidia/GR00T-N1.6-BEHAVIOR1k \
    --embodiment-tag BEHAVIOR_R1_PRO \
    --output-dir ./onnx/behavior1k

Inference Usage

ONNX Inference Server (ZMQ)

The inference server loads ONNX models and serves action predictions over ZMQ (port 5555):

# Fractal (Google Robot)
PYTHONPATH="" CUDA_VISIBLE_DEVICES=0 python scripts/run_gr00t_onnx_server.py \
    --model-path nvidia/GR00T-N1.6-fractal \
    --embodiment-tag OXE_GOOGLE \
    --onnx-dir ./onnx/fractal \
    --port 5555

# Bridge (WidowX) — uses fractal backbone
PYTHONPATH="" CUDA_VISIBLE_DEVICES=0 python scripts/run_gr00t_onnx_server.py \
    --model-path nvidia/GR00T-N1.6-bridge \
    --embodiment-tag OXE_WIDOWX \
    --onnx-dir ./onnx/bridge \
    --backbone-onnx-dir ./onnx/fractal \
    --port 5555

# BEHAVIOR-1K
PYTHONPATH="" CUDA_VISIBLE_DEVICES=0 python scripts/run_gr00t_onnx_server.py \
    --model-path nvidia/GR00T-N1.6-BEHAVIOR1k \
    --embodiment-tag BEHAVIOR_R1_PRO \
    --onnx-dir ./onnx/behavior1k \
    --port 5555

SimplerEnv Evaluation Client

Connect a SimplerEnv evaluation script to the running ONNX server:

python simpler_env_eval.py \
    --task google_robot_pick_coke_can \
    --policy gr00t_zmq \
    --zmq-host localhost \
    --zmq-port 5555

Simulation Evaluation

SimplerEnv Evaluation (PyTorch + ONNX)

SimplerEnv evaluation runs as a two-process pipeline: a server (loads the model) and a client (runs the simulation).

Step 1: Start the inference server

PyTorch server (original GR00T pipeline):

# Fractal (Google Robot) on GPU 0, port 5556
PYTHONPATH="" CUDA_VISIBLE_DEVICES=0 python gr00t/eval/run_gr00t_server.py \
    --model_path nvidia/GR00T-N1.6-fractal \
    --embodiment_tag OXE_GOOGLE \
    --port 5556

# Bridge (WidowX) on GPU 0, port 5557
PYTHONPATH="" CUDA_VISIBLE_DEVICES=0 python gr00t/eval/run_gr00t_server.py \
    --model_path nvidia/GR00T-N1.6-bridge \
    --embodiment_tag OXE_WIDOWX \
    --port 5557

ONNX server (PyTorch-free, uses scripts/run_gr00t_onnx_server.py):

# Fractal on GPU 0, port 5556
PYTHONPATH="" CUDA_VISIBLE_DEVICES=0 python scripts/run_gr00t_onnx_server.py \
    --model-path nvidia/GR00T-N1.6-fractal \
    --embodiment-tag OXE_GOOGLE \
    --onnx-dir ./onnx/fractal \
    --port 5556

# Bridge on GPU 0, port 5557
PYTHONPATH="" CUDA_VISIBLE_DEVICES=0 python scripts/run_gr00t_onnx_server.py \
    --model-path nvidia/GR00T-N1.6-bridge \
    --embodiment-tag OXE_WIDOWX \
    --onnx-dir ./onnx/bridge \
    --backbone-onnx-dir ./onnx/fractal \
    --port 5557

Step 2: Run SimplerEnv evaluation

Once the server is ready, run the evaluation client:

bash scripts/run_simplerenv_onnx_eval.sh

This script runs all Fractal (Google Robot) tasks against port 5556 and collects results + videos. See the script for customizing tasks, episode counts, and output paths.

PYTHONPATH note: The SimplerEnv eval script explicitly sets PYTHONPATH="" and PYTHONNOUSERSITE=1 to avoid conflicts between the SimplerEnv virtualenv and the Isaac-GR00T environment. The SimplerEnv Python binary is invoked directly from its own .venv.

BEHAVIOR-1K Evaluation

BEHAVIOR-1K evaluation requires OmniGibson and the BEHAVIOR-1K environment installed separately (not bundled in this repo). The server and client run on different GPUs to avoid VRAM conflicts.

Prerequisites

Isaac-GR00T installed with its virtualenv
BEHAVIOR-1K / OmniGibson installed with its own virtualenv

Step 1: Start GR00T inference server (GPU 0)

PYTHONPATH="" CUDA_VISIBLE_DEVICES=0 python gr00t/eval/run_gr00t_server.py \
    --model_path nvidia/GR00T-N1.6-BEHAVIOR1k \
    --embodiment_tag BEHAVIOR_R1_PRO \
    --port 5555 \
    --use_sim_policy_wrapper

Step 2: Run BEHAVIOR-1K eval client (GPU 1)

bash scripts/run_behavior1k_eval.sh

The script runs 5 tasks × 5 episodes each against the server on port 5555.

Required environment variables (already set in the script):

export PYTHONPATH=/path/to/Isaac-GR00T
export PYTHONNOUSERSITE=1
export OMNI_KIT_ACCEPT_EULA=YES
export OMNIGIBSON_HEADLESS=1    # Run without display (headless)

Tasks evaluated:

turning_on_radio
picking_up_trash
picking_up_toys
putting_shoes_on_rack
clean_up_your_desk

Note: OmniGibson may produce a segfault on cleanup — this is expected behavior and does not affect results. The script does not use set -e for this reason.

BEHAVIOR-1K Third-Person View Rendering

For visualization, a third-person camera can be added that tracks the robot and produces a high-resolution 1280×720 video using Path Tracing with the OptiX denoiser.

Quick start (automated launcher)

bash scripts/run_behavior1k_thirdperson.sh

This script automatically:

Starts the GR00T inference server on GPU 0 (port 5555)
Waits for server readiness
Runs one episode of picking_up_trash with third-person rendering on GPU 1
Outputs videos to /path/to/gr00t_behavior1k_thirdperson_videos/

Manual usage

# Terminal 1 — Start GR00T inference server (GPU 0)
PYTHONPATH="" CUDA_VISIBLE_DEVICES=0 python gr00t/eval/run_gr00t_server.py \
    --model_path nvidia/GR00T-N1.6-BEHAVIOR1k \
    --embodiment_tag BEHAVIOR_R1_PRO \
    --port 5555 \
    --use_sim_policy_wrapper

# Terminal 2 — Run third-person rendering client (GPU 1)
PYTHONPATH=/path/to/Isaac-GR00T \
PYTHONNOUSERSITE=1 \
OMNIGIBSON_HEADLESS=1 \
OMNI_KIT_ACCEPT_EULA=YES \
CUDA_VISIBLE_DEVICES=1 \
python scripts/run_behavior1k_thirdperson.py \
    --task picking_up_trash \
    --n_episodes 1 \
    --policy_client_host 127.0.0.1 \
    --policy_client_port 5555 \
    --output_dir ./thirdperson_videos \
    --max_episode_steps 720 \
    --n_action_steps 8 \
    --n_render_iterations 20 \
    --spp 1

Key renderer settings (noise-free rendering on H100)

The run_behavior1k_thirdperson.py script configures the RTX renderer to eliminate noise artifacts:

Setting	Value	Why
Render mode	Path Tracing (mode=2)	Higher quality than RTX Real-Time
`optixDenoiser/enabled`	`True`	Enable OptiX denoiser
`optixDenoiser/blendFactor`	`0.0`	Critical: default is `1.0` (= no denoising!)
`optixDenoiser/temporalMode/enabled`	`True`	Prevents flickering between video frames
`pathtracing/clampSpp`	`0`	Critical: default clamps SPP to 32!
`pathtracing/spp`	`1` (fast) or `128` (quality)	Samples per pixel per render call
`n_render_iterations`	`20` (fast) or `5` (with high SPP)	Extra render passes per frame
`adaptiveSampling/enabled`	`True`	Concentrates samples on noisy regions
`fireflyFilter/enabled`	`True`	Clamps bright noise speckles

Recommended settings for H100: --spp 1 --n_render_iterations 20 (fast, low memory) or --spp 128 --n_render_iterations 5 (higher quality, slower). The default blendFactor=1.0 bug means denoising is silently disabled unless explicitly set to 0.0.

Output

Two video streams are recorded per episode:

Third-person view (thirdperson/thirdperson_<uuid>_s{0,1}.mp4): 1280×720, follows robot with configurable offset
Robot view (robotview/): Standard wrist/head camera view from observation dict

The filename suffix _s1.mp4 indicates a successful episode; _s0.mp4 indicates failure.

Validation Results

Cosine similarity between PyTorch and ONNX outputs (higher is better, 1.0 = identical):

Model	Backbone Cosine Sim	Action Head Cosine Sim
Fractal	1.000000	1.000000
Bridge	1.000000	1.000000
BEHAVIOR-1K	1.000000	1.000000

SimplerEnv Evaluation Results

Fractal (Google Robot)

Task	PyTorch	ONNX
pick_coke_can	100%	100%
pick_object	100%	100%
move_near	100%	100%
open_drawer	partial	0%
close_drawer	50%	20%
place_in_closed_drawer	0%	0%

Bridge (WidowX)

Task	PyTorch	ONNX
spoon_on_towel	100%	0%
carrot_on_plate	100%	0%
put_eggplant_in_basket	100%	0%
stack_cube	0%	0%
put_eggplant_in_sink	72.7%	0%
close_drawer	100%	20%
open_drawer	100%	0%

BEHAVIOR-1K

Task	PyTorch
turning_on_radio	0%
picking_up_trash	0%
picking_up_toys	0%
putting_shoes_on_rack	0%
clean_up_your_desk	0%

Note: BEHAVIOR-1K tasks show 0% for both PyTorch and ONNX, indicating these tasks are inherently challenging for the base model in the SimplerEnv simulation environment.

Requirements

onnxruntime-gpu>=1.17.0
numpy
pyzmq (for inference server)
Pillow (for image processing)
CUDA-capable GPU with sufficient VRAM (~8GB)

License

This repository contains ONNX-converted models derived from NVIDIA's GR00T N1.6 under the Apache 2.0 License.

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Robotics