X-VLA v21 β€” BEHAVIOR-1K Task 0 (Turn on Radio), curated merged dataset

Fine-tune of X-VLA on BEHAVIOR-1K task 0 ("turn on the radio receiver"), a 60,000-iteration run. Intermediate checkpoints are uploaded as they are produced, each in its own ckpt-<step>/ subfolder:

Weights only β€” the optimizer state is not included, so these checkpoints are for eval/inference, not for resuming training.

Training data β€” curated merged task-0 set

Two "turn on radio" sources were merged and curated into 1521 episodes:

Source Episodes (after curation) Skills kept
Official 2025-challenge demos 191 move-to, pick-up, press
MP-collected 1330 pick-up, press
  • Per-episode skill filtering: place-on / "put-down" and frames outside any annotated skill segment are dropped everywhere; navigation is kept only from the official demos. The model learns: navigate β†’ pick up β†’ press.
  • Outlier filtering: 33 episodes removed via per-episode PCA distance (d_combined > 4.4) plus a known-bad MP episode block.
  • Skill-class weights for the weighted-CE skill classifier were recomputed (sqrt-inverse-frequency) over the curated 3-skill distribution.

Training config

Setting Value
Base X-VLA-Pt
GPUs 8 Γ— H200, bf16
Per-GPU batch 32 (effective 256)
LR 1e-5, cosine decay to 1e-6, 2000-step warmup
Iterations 60,000 (this checkpoint: 30,000)
Action space 23-D R1Pro (base + trunk + dual arm + grippers)

At ~30k steps the joints loss is ~0.03–0.05 and the skill-classifier loss has converged near zero.

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading