YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

X-VLA v5 Checkpoints β€” BEHAVIOR-1K

Changes from v4

1. Attention-Pooled Classifiers

  • Replaced mean-pool + single-layer classifier_proj with per-classifier AttentionPool + 2-layer MLP projections (skill, task, object)
  • 1% gradient flows back to VLM (instead of full .detach()) so the backbone learns discriminative features
  • Deeper projection: Linear(Dβ†’2D) β†’ GELU β†’ LayerNorm β†’ Linear(2Dβ†’D) β†’ GELU

2. Concatenated Soft Prompts

  • Replaced element-wise addition (task + skill + object) with concatenation
  • Token budget split: task=8, skill=16, object=8 (total=32, same as before)
  • Prevents destructive interference between prompt sources

Training Config

  • Base model: X-VLA-Pt
  • Learning rate: 2e-5 (cosine decay, 0.1 min ratio)
  • Soft prompt LR: 2e-6 (0.1x multiplier)
  • Batch size: 16 Γ— 4 GPUs
  • Freeze steps: 1000, warmup: 2000
  • Action mode: auto (23D)

Checkpoints

Folder Task Steps Notes
task0-20k 0 (turning_on_radio) 20,000 Single-task
task40-30k 40 (make_microwave_popcorn) 30,000 Single-task
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support