RLDX-1-FT-SIMPLER-WIDOWX
Paper · Project page · Code · Models
RLDX-1 is a general-purpose Robot Foundation Model designed for dexterous manipulation. Powered by a Multi-Stream Action Transformer (MSAT), it seamlessly unifies multimodal perception (visual + tactile), high-DoF actuation, and memory-aware decision-making in a single architecture.
This repository hosts RLDX-1-FT-SIMPLER-WIDOWX — RLDX-1 finetuned for
the SimplerEnv WidowX benchmark (BridgeData-style WidowX 250 tasks).
It achieves 71.9% average success.
Highlights
- Multi-Stream Action Transformer (MSAT). Cognition, physics, and action each get a dedicated stream coupled by joint self-attention — an extension of MM-DiT to action modeling.
- Motion awareness. Multi-frame observations + a motion module capture temporal dynamics; intermediate VLM layers compress video tokens to keep the policy efficient.
- Long-term memory. A memory module fuses past cognition features with the current ones for history-grounded decisions beyond a short multi-frame window.
- Physical sensing. Tactile and torque enter as a dedicated physics stream; the decoder is jointly trained to predict future physical signals.
- Three-stage training. Pre-training (generalization) → mid-training (functionality) → post-training (task adaptation), with synthetic data augmenting rare manipulation scenarios.
- Real-time inference. Static graph capture + custom fused kernels bring the all-modality model to 43.7 ms / step on RTX 5090 (1.63× speedup, >22 Hz).
Performance
| Benchmark | Success Rate |
|---|---|
| SIMPLER WidowX | 71.9% |
Quick start
Installation
git clone https://github.com/RLWRLD/RLDX-1.git
cd RLDX
uv sync --python 3.10
uv pip install -e .
Inference
from rldx.policy.rldx_policy import RLDXPolicy
from rldx.data.embodiment_tags import EmbodimentTag
policy = RLDXPolicy(
model_path="RLWRLD/RLDX-1-FT-SIMPLER-WIDOWX",
embodiment_tag=EmbodimentTag.OXE_BRIDGE_ORIG,
device="cuda:0",
)
action = policy.get_action(observation)
Real-time serving (ZeroMQ)
uv run python rldx/eval/run_rldx_server.py \
--model-path RLWRLD/RLDX-1-FT-SIMPLER-WIDOWX \
--embodiment-tag OXE_BRIDGE_ORIG \
--host 0.0.0.0 --port 20000
To reproduce the benchmark numbers end-to-end, see
run_scripts/eval/simpler/README.md.
Model details
- Architecture: Multi-Stream Action Transformer (MSAT) policy on a Qwen3-VL backbone with cognition-token perceptual summary. Trained with flow matching.
- Inputs: RGB video (default 4 frames), state proprioception, language instruction.
- Outputs: Action chunks of length 16.
- Embodiment tag:
OXE_BRIDGE_ORIG. - Base model:
RLWRLD/RLDX-1-PT. - Backbone:
Qwen/Qwen3-VL-8B-Instruct. - Finetune data: SimplerEnv WidowX training set (BridgeData subset of OXE).
- Params: 6.9B.
For the full architectural walkthrough see
docs/architecture.md.
RLDX-1 model family
| Checkpoint | Description |
|---|---|
RLDX-1-PT |
Multi-source pretrained foundation |
RLDX-1-VLM |
Qwen3-VL-8B vision-language backbone |
RLDX-1-FT-ROBOCASA |
RoboCasa Kitchen 24-task finetune |
RLDX-1-FT-RC365 |
RoboCasa-365 cross-task finetune |
RLDX-1-FT-LIBERO |
LIBERO 4-task suite (goal, object, spatial, long) finetune |
RLDX-1-FT-SIMPLER-GOOGLE |
SIMPLER Google VM/VA finetune |
RLDX-1-FT-SIMPLER-WIDOWX |
SIMPLER WidowX finetune (this repo) |
RLDX-1-FT-GR1 |
GR-1 Tabletop finetune |
RLDX-1-MT-DROID |
DROID mid-train |
RLDX-1-MT-ALLEX |
All add-ons (memory + motion + physics + video) |
Intended use & limitations
Intended use. Research on robotic manipulation, simulation benchmarking on SimplerEnv WidowX, and non-commercial real-robot deployment under the conditions of the RLWRLD Model License v1.0.
Out of scope. Commercial deployment, military or weapons applications,
non-consensual surveillance, and any use that violates applicable laws or
regulations. See LICENSE.md §3.5 for the full list.
Limitations. Conditioned on the WidowX 250 BridgeData embodiment. For
Google-Robot evaluation use
RLDX-1-FT-SIMPLER-GOOGLE;
for other embodiments, finetune from
RLDX-1-PT instead.
Citation
@article{rldx2026,
title={RLDX-1 Technical Report},
author={Kim, Dongyoung and Jang, Huiwon and Koo, Myungkyu and Jang, Suhyeok and Kim, Taeyoung and others},
year={2026},
note={RLWRLD},
eprint={2605.03269},
archivePrefix={arXiv},
url={https://arxiv.org/abs/2605.03269}
}
License
Released under the RLWRLD Model License v1.0 — a non-commercial license
with attribution and share-alike requirements. See LICENSE.md for
the full text. By using this model you agree to those terms, including the
use restrictions in §3.5.
- Downloads last month
- 29