HapticVLA — Contact-Rich Manipulation without Inference-Time Tactile Sensing

Distilled from a tactile-conditioned SA-RWFM teacher via offline action-level knowledge distillation. At inference, this model is an unmodified SmolVLA — no tactile sensors, no extra modules, zero overhead.

Key Result

HapticVLA achieves 86.7% mean success rate (vs. 61.7% baseline, 75.0% tactile teacher) with the lowest force error rate of any model (5.0%), including the tactile-equipped teacher — all without requiring tactile sensors at inference.

Model Details

  • Base model: lerobot/smolvla_base (450M params)
  • Action space: 6-DOF absolute joint positions (indices 6–11)
  • Distillation: Offline action-level KD, blended targets (α=0.5)
  • Teacher: SA-RWFM with dual tactile sensors
  • Best validation loss: 1.52
  • Training: 50K steps, RTX 4090, ~3.5 hrs

Performance (Sync Mode, 20 trials per task)

Task Success Rate Force Errors
Eggs 95% 1/20
Can 75% 5/20
Waffles 90% 2/20
Mean 86.7% 8/60

Comparison

Model Mean Success Force Error Rate Tactile Required
SmolVLA (Baseline) 61.7% 26.7% No
SA-RWFM (Teacher) 75.0% 11.7% Yes
HapticVLA (Ours) 86.7% 5.0% No

Distillation Approach

  1. Pre-compute: Generate teacher action predictions offline on training data
  2. Initialize: Copy teacher backbone weights, adapt state projection (134D → 6D)
  3. Train: Flow matching loss on blended targets: (1-α)·GT + α·teacher_prediction
  4. Deploy: Standard SmolVLA inference — no modifications needed

Usage

import torch
checkpoint = torch.load("best/model.pt", map_location="cpu")

See Advanced-Robotic-Manipulation/crab for full inference pipeline.

Related Models

Citation

If you use this model, please cite our paper:

@article{gubernatorov2026hapticvla,
  title={HapticVLA: Contact-Rich Manipulation via Vision-Language-Action Model without Inference-Time Tactile Sensing},
  author={Gubernatorov, Konstantin and Sannikov, Mikhail and Mikhalchuk, Ilya and Kuznetsov, Egor and Artemov, Makar and Ouwatobi, Ogunwoye Faith and Fernando, Marcelino and Asanov, Artem and Guo, Ziang and Tsetserukou, Dzmitry},
  journal={arXiv preprint arXiv:2603.15257},
  year={2026}
}
Downloads last month
32
Video Preview
loading

Model tree for armteam/crab-smolvla-hapticsvla

Finetuned
(5342)
this model

Paper for armteam/crab-smolvla-hapticsvla