HapticVLA — Contact-Rich Manipulation without Inference-Time Tactile Sensing

Distilled from a tactile-conditioned SA-RWFM teacher via offline action-level knowledge distillation. At inference, this model is an unmodified SmolVLA — no tactile sensors, no extra modules, zero overhead.

Key Result

HapticVLA achieves 86.7% mean success rate (vs. 61.7% baseline, 75.0% tactile teacher) with the lowest force error rate of any model (5.0%), including the tactile-equipped teacher — all without requiring tactile sensors at inference.

Model Details

Base model: lerobot/smolvla_base (450M params)
Action space: 6-DOF absolute joint positions (indices 6–11)
Distillation: Offline action-level KD, blended targets (α=0.5)
Teacher: SA-RWFM with dual tactile sensors
Best validation loss: 1.52
Training: 50K steps, RTX 4090, ~3.5 hrs

Performance (Sync Mode, 20 trials per task)

Task	Success Rate	Force Errors
Eggs	95%	1/20
Can	75%	5/20
Waffles	90%	2/20
Mean	86.7%	8/60

Comparison

Model	Mean Success	Force Error Rate	Tactile Required
SmolVLA (Baseline)	61.7%	26.7%	No
SA-RWFM (Teacher)	75.0%	11.7%	Yes
HapticVLA (Ours)	86.7%	5.0%	No

Distillation Approach

Pre-compute: Generate teacher action predictions offline on training data
Initialize: Copy teacher backbone weights, adapt state projection (134D → 6D)
Train: Flow matching loss on blended targets: (1-α)·GT + α·teacher_prediction
Deploy: Standard SmolVLA inference — no modifications needed

Usage

import torch
checkpoint = torch.load("best/model.pt", map_location="cpu")

See Advanced-Robotic-Manipulation/crab for full inference pipeline.

Related Models

crab-smolvla-left-arm — Left arm (always paired with right arm models)
crab-smolvla-right-arm — Right arm baseline
crab-smolvla-rwfm — SA-RWFM tactile teacher

Citation

If you use this model, please cite our paper:

@article{gubernatorov2026hapticvla,
  title={HapticVLA: Contact-Rich Manipulation via Vision-Language-Action Model without Inference-Time Tactile Sensing},
  author={Gubernatorov, Konstantin and Sannikov, Mikhail and Mikhalchuk, Ilya and Kuznetsov, Egor and Artemov, Makar and Ouwatobi, Ogunwoye Faith and Fernando, Marcelino and Asanov, Artem and Guo, Ziang and Tsetserukou, Dzmitry},
  journal={arXiv preprint arXiv:2603.15257},
  year={2026}
}

Downloads last month: 32

Video Preview

Robotics

Model tree for armteam/crab-smolvla-hapticsvla

Base model

lerobot/smolvla_base

Finetuned

(5342)

this model

Paper for armteam/crab-smolvla-hapticsvla

HapticVLA: Contact-Rich Manipulation via Vision-Language-Action Model without Inference-Time Tactile Sensing

Paper • 2603.15257 • Published Mar 16