X-VLA v21 — BEHAVIOR-1K Task 0 (Turn on Radio), curated merged dataset

Fine-tune of X-VLA on BEHAVIOR-1K task 0 ("turn on the radio receiver"), a 60,000-iteration run. Intermediate checkpoints are uploaded as they are produced, each in its own ckpt-<step>/ subfolder:

Weights only — the optimizer state is not included, so these checkpoints are for eval/inference, not for resuming training.

Training data — curated merged task-0 set

Two "turn on radio" sources were merged and curated into 1521 episodes:

Source	Episodes (after curation)	Skills kept
Official 2025-challenge demos	191	move-to, pick-up, press
MP-collected	1330	pick-up, press

Per-episode skill filtering: place-on / "put-down" and frames outside any annotated skill segment are dropped everywhere; navigation is kept only from the official demos. The model learns: navigate → pick up → press.
Outlier filtering: 33 episodes removed via per-episode PCA distance (d_combined > 4.4) plus a known-bad MP episode block.
Skill-class weights for the weighted-CE skill classifier were recomputed (sqrt-inverse-frequency) over the curated 3-skill distribution.

Training config

Setting	Value
Base	X-VLA-Pt
GPUs	8 × H200, bf16
Per-GPU batch	32 (effective 256)
LR	1e-5, cosine decay to 1e-6, 2000-step warmup
Iterations	60,000 (this checkpoint: 30,000)
Action space	23-D R1Pro (base + trunk + dual arm + grippers)

At ~30k steps the joints loss is ~0.03–0.05 and the skill-classifier loss has converged near zero.

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Robotics