YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
X-VLA v5 Checkpoints β BEHAVIOR-1K
Changes from v4
1. Attention-Pooled Classifiers
- Replaced mean-pool + single-layer
classifier_projwith per-classifierAttentionPool+ 2-layer MLP projections (skill, task, object) - 1% gradient flows back to VLM (instead of full
.detach()) so the backbone learns discriminative features - Deeper projection:
Linear(Dβ2D) β GELU β LayerNorm β Linear(2DβD) β GELU
2. Concatenated Soft Prompts
- Replaced element-wise addition (
task + skill + object) with concatenation - Token budget split: task=8, skill=16, object=8 (total=32, same as before)
- Prevents destructive interference between prompt sources
Training Config
- Base model:
X-VLA-Pt - Learning rate: 2e-5 (cosine decay, 0.1 min ratio)
- Soft prompt LR: 2e-6 (0.1x multiplier)
- Batch size: 16 Γ 4 GPUs
- Freeze steps: 1000, warmup: 2000
- Action mode: auto (23D)
Checkpoints
| Folder | Task | Steps | Notes |
|---|---|---|---|
task0-20k |
0 (turning_on_radio) | 20,000 | Single-task |
task40-30k |
40 (make_microwave_popcorn) | 30,000 | Single-task |
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support