X-VLA v21 β BEHAVIOR-1K Task 0 (Turn on Radio), curated merged dataset
Fine-tune of X-VLA on BEHAVIOR-1K task 0 ("turn on the radio receiver"),
a 60,000-iteration run. Intermediate checkpoints are uploaded as they
are produced, each in its own ckpt-<step>/ subfolder:
Weights only β the optimizer state is not included, so these checkpoints are for eval/inference, not for resuming training.
Training data β curated merged task-0 set
Two "turn on radio" sources were merged and curated into 1521 episodes:
| Source | Episodes (after curation) | Skills kept |
|---|---|---|
| Official 2025-challenge demos | 191 | move-to, pick-up, press |
| MP-collected | 1330 | pick-up, press |
- Per-episode skill filtering: place-on / "put-down" and frames outside any annotated skill segment are dropped everywhere; navigation is kept only from the official demos. The model learns: navigate β pick up β press.
- Outlier filtering: 33 episodes removed via per-episode PCA distance
(
d_combined > 4.4) plus a known-bad MP episode block. - Skill-class weights for the weighted-CE skill classifier were recomputed (sqrt-inverse-frequency) over the curated 3-skill distribution.
Training config
| Setting | Value |
|---|---|
| Base | X-VLA-Pt |
| GPUs | 8 Γ H200, bf16 |
| Per-GPU batch | 32 (effective 256) |
| LR | 1e-5, cosine decay to 1e-6, 2000-step warmup |
| Iterations | 60,000 (this checkpoint: 30,000) |
| Action space | 23-D R1Pro (base + trunk + dual arm + grippers) |
At ~30k steps the joints loss is ~0.03β0.05 and the skill-classifier loss has converged near zero.