HapticVLA: Contact-Rich Manipulation via Vision-Language-Action Model without Inference-Time Tactile Sensing
Paper • 2603.15257 • Published
Fine-tuned SmolVLA for right-arm (6-DOF) manipulation on the Crab bimanual mobile manipulator. This serves as the baseline model without tactile sensing.
lerobot/smolvla_base (450M params)| Task | Success Rate |
|---|---|
| Eggs | 50% |
| Can | 65% |
| Waffles | 70% |
| Mean | 61.7% |
import torch
checkpoint = torch.load("best/model.pt", map_location="cpu")
See Advanced-Robotic-Manipulation/crab for full inference pipeline.
If you use this model, please cite our paper:
@article{gubernatorov2026hapticvla,
title={HapticVLA: Contact-Rich Manipulation via Vision-Language-Action Model without Inference-Time Tactile Sensing},
author={Gubernatorov, Konstantin and Sannikov, Mikhail and Mikhalchuk, Ilya and Kuznetsov, Egor and Artemov, Makar and Ouwatobi, Ogunwoye Faith and Fernando, Marcelino and Asanov, Artem and Guo, Ziang and Tsetserukou, Dzmitry},
journal={arXiv preprint arXiv:2603.15257},
year={2026}
}
Base model
lerobot/smolvla_base