X-VLA BEHAVIOR-1K Checkpoints

Fine-tuned X-VLA checkpoints on the BEHAVIOR-1K 2025 challenge demos dataset.

Architecture

These checkpoints extend the pretrained X-VLA model with:

Additive task + skill soft prompts: Replaces the single domain prompt with task_prompt[task_id] + skill_prompt[skill_id] (both zero-initialized, 50 tasks x 34 skills)
Skill classifier: Linear head on VLM features that predicts the current skill from vision+language (trained with auxiliary CE loss, lambda=0.1)
Enriched language instructions: Training instructions augmented with skill and object info (e.g. "Turn on the radio. Current: pick up radio from coffee table.")

Pretrained base: X-VLA-Pt | Action dim: 23 | Action mode: auto

Checkpoints

Directory	LR	GPUs	Steps	Status	Notes
`additive_prompts_lr2e4/`	2e-4	4	200k	Complete	All 50 tasks, cosine decay
`additive_prompts_lr2e5/`	2e-5	4	200k	Complete	All 50 tasks, cosine decay
`additive_prompts_1e4/`	1e-4	4	100k	Partial (NCCL crashes)	All 50 tasks, cosine decay
`additive_prompts_lr5e4/`	5e-4	4	50k	Diverged	Abandoned
`single_task_1/`	2e-5	4	200k	Complete	Task 1 only (~200 episodes)
`single_task_1_2_5_16_18/`	2e-5	4	200k	Complete	Tasks 1,2,5,16,18 (~1000 episodes)

Training Details

Dataset: /shared_work/DATASETS/behavior-1k-2025-challenge-demos/ (50 tasks, ~200 episodes each, 10k total)
Batch size: 16 per GPU
Schedule: Linear warmup (2k steps) + cosine decay, freeze-then-unfreeze (1k steps)
LR coefficients: VLM and soft prompts at 0.1x base LR
Optimizer: AdamW, betas=(0.9, 0.95)
Precision: bf16 mixed precision

Usage

from models.modeling_xvla import XVLA

model = XVLA.from_pretrained("Hoshipu/xvla-behavior1k-checkpoints/additive_prompts_lr2e4")

# Inference — skill_id is auto-predicted by the classifier
actions = model.generate_actions(
    images=images,
    input_ids=input_ids,
    task_id=task_id_tensor,
)

File Structure

Each checkpoint directory contains:

model.safetensors — model weights
config.json — model configuration (includes num_tasks, num_skills)
optimizer.pt — optimizer state (for resuming training)
state.json — global step counter
Tokenizer files (merges.txt, vocab.json, etc.)

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Robotics