X-VLA BEHAVIOR-1K Checkpoints

Fine-tuned X-VLA checkpoints on the BEHAVIOR-1K 2025 challenge demos dataset.

Architecture

These checkpoints extend the pretrained X-VLA model with:

  • Additive task + skill soft prompts: Replaces the single domain prompt with task_prompt[task_id] + skill_prompt[skill_id] (both zero-initialized, 50 tasks x 34 skills)
  • Skill classifier: Linear head on VLM features that predicts the current skill from vision+language (trained with auxiliary CE loss, lambda=0.1)
  • Enriched language instructions: Training instructions augmented with skill and object info (e.g. "Turn on the radio. Current: pick up radio from coffee table.")

Pretrained base: X-VLA-Pt | Action dim: 23 | Action mode: auto

Checkpoints

Directory LR GPUs Steps Status Notes
additive_prompts_lr2e4/ 2e-4 4 200k Complete All 50 tasks, cosine decay
additive_prompts_lr2e5/ 2e-5 4 200k Complete All 50 tasks, cosine decay
additive_prompts_1e4/ 1e-4 4 100k Partial (NCCL crashes) All 50 tasks, cosine decay
additive_prompts_lr5e4/ 5e-4 4 50k Diverged Abandoned
single_task_1/ 2e-5 4 200k Complete Task 1 only (~200 episodes)
single_task_1_2_5_16_18/ 2e-5 4 200k Complete Tasks 1,2,5,16,18 (~1000 episodes)

Training Details

  • Dataset: /shared_work/DATASETS/behavior-1k-2025-challenge-demos/ (50 tasks, ~200 episodes each, 10k total)
  • Batch size: 16 per GPU
  • Schedule: Linear warmup (2k steps) + cosine decay, freeze-then-unfreeze (1k steps)
  • LR coefficients: VLM and soft prompts at 0.1x base LR
  • Optimizer: AdamW, betas=(0.9, 0.95)
  • Precision: bf16 mixed precision

Usage

from models.modeling_xvla import XVLA

model = XVLA.from_pretrained("Hoshipu/xvla-behavior1k-checkpoints/additive_prompts_lr2e4")

# Inference โ€” skill_id is auto-predicted by the classifier
actions = model.generate_actions(
    images=images,
    input_ids=input_ids,
    task_id=task_id_tensor,
)

File Structure

Each checkpoint directory contains:

  • model.safetensors โ€” model weights
  • config.json โ€” model configuration (includes num_tasks, num_skills)
  • optimizer.pt โ€” optimizer state (for resuming training)
  • state.json โ€” global step counter
  • Tokenizer files (merges.txt, vocab.json, etc.)
Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading