Crab SmolVLA — Right Arm (Baseline)

Fine-tuned SmolVLA for right-arm (6-DOF) manipulation on the Crab bimanual mobile manipulator. This serves as the baseline model without tactile sensing.

Model Details

  • Base model: lerobot/smolvla_base (450M params)
  • Action space: 6-DOF absolute joint positions (indices 6–11)
  • Training data: 27 demonstrations across 3 tasks (eggs, can, waffles)
  • Best validation loss: 1.31
  • Training: 50K steps, RTX 4090, ~3.5 hrs

Performance (Sync Mode, 20 trials per task)

Task Success Rate
Eggs 50%
Can 65%
Waffles 70%
Mean 61.7%

Usage

import torch
checkpoint = torch.load("best/model.pt", map_location="cpu")

See Advanced-Robotic-Manipulation/crab for full inference pipeline.

Citation

If you use this model, please cite our paper:

@article{gubernatorov2026hapticvla,
  title={HapticVLA: Contact-Rich Manipulation via Vision-Language-Action Model without Inference-Time Tactile Sensing},
  author={Gubernatorov, Konstantin and Sannikov, Mikhail and Mikhalchuk, Ilya and Kuznetsov, Egor and Artemov, Makar and Ouwatobi, Ogunwoye Faith and Fernando, Marcelino and Asanov, Artem and Guo, Ziang and Tsetserukou, Dzmitry},
  journal={arXiv preprint arXiv:2603.15257},
  year={2026}
}
Downloads last month
9
Video Preview
loading

Model tree for armteam/crab-smolvla-right-arm

Finetuned
(5342)
this model

Paper for armteam/crab-smolvla-right-arm