X-VLA BEHAVIOR-1K Checkpoints
Fine-tuned X-VLA checkpoints on the BEHAVIOR-1K 2025 challenge demos dataset.
Architecture
These checkpoints extend the pretrained X-VLA model with:
- Additive task + skill soft prompts: Replaces the single domain prompt with
task_prompt[task_id] + skill_prompt[skill_id](both zero-initialized, 50 tasks x 34 skills) - Skill classifier: Linear head on VLM features that predicts the current skill from vision+language (trained with auxiliary CE loss, lambda=0.1)
- Enriched language instructions: Training instructions augmented with skill and object info (e.g. "Turn on the radio. Current: pick up radio from coffee table.")
Pretrained base: X-VLA-Pt | Action dim: 23 | Action mode: auto
Checkpoints
| Directory | LR | GPUs | Steps | Status | Notes |
|---|---|---|---|---|---|
additive_prompts_lr2e4/ |
2e-4 | 4 | 200k | Complete | All 50 tasks, cosine decay |
additive_prompts_lr2e5/ |
2e-5 | 4 | 200k | Complete | All 50 tasks, cosine decay |
additive_prompts_1e4/ |
1e-4 | 4 | 100k | Partial (NCCL crashes) | All 50 tasks, cosine decay |
additive_prompts_lr5e4/ |
5e-4 | 4 | 50k | Diverged | Abandoned |
single_task_1/ |
2e-5 | 4 | 200k | Complete | Task 1 only (~200 episodes) |
single_task_1_2_5_16_18/ |
2e-5 | 4 | 200k | Complete | Tasks 1,2,5,16,18 (~1000 episodes) |
Training Details
- Dataset:
/shared_work/DATASETS/behavior-1k-2025-challenge-demos/(50 tasks, ~200 episodes each, 10k total) - Batch size: 16 per GPU
- Schedule: Linear warmup (2k steps) + cosine decay, freeze-then-unfreeze (1k steps)
- LR coefficients: VLM and soft prompts at 0.1x base LR
- Optimizer: AdamW, betas=(0.9, 0.95)
- Precision: bf16 mixed precision
Usage
from models.modeling_xvla import XVLA
model = XVLA.from_pretrained("Hoshipu/xvla-behavior1k-checkpoints/additive_prompts_lr2e4")
# Inference โ skill_id is auto-predicted by the classifier
actions = model.generate_actions(
images=images,
input_ids=input_ids,
task_id=task_id_tensor,
)
File Structure
Each checkpoint directory contains:
model.safetensorsโ model weightsconfig.jsonโ model configuration (includes num_tasks, num_skills)optimizer.ptโ optimizer state (for resuming training)state.jsonโ global step counter- Tokenizer files (merges.txt, vocab.json, etc.)