armteam
/

crab-smolvla-hapticsvla

@@ -10,13 +10,13 @@ tags:
 base_model: lerobot/smolvla_base
 ---
-# HapticsVLA — Contact-Rich Manipulation without Inference-Time Tactile Sensing
 Distilled from a tactile-conditioned [SA-RWFM teacher](https://huggingface.co/armteam/crab-smolvla-rwfm) via offline action-level knowledge distillation. At inference, this model is an **unmodified SmolVLA** — no tactile sensors, no extra modules, zero overhead.
 ## Key Result
-HapticsVLA achieves **86.7% mean success rate** (vs. 61.7% baseline, 75.0% tactile teacher) with the **lowest force error rate of any model (5.0%)**, including the tactile-equipped teacher — all without requiring tactile sensors at inference.
 ## Model Details
@@ -42,7 +42,7 @@ HapticsVLA achieves **86.7% mean success rate** (vs. 61.7% baseline, 75.0% tacti
 |-------|-------------|-----------------|-----------------|
 | SmolVLA (Baseline) | 61.7% | 26.7% | No |
 | SA-RWFM (Teacher) | 75.0% | 11.7% | **Yes** |
-| **HapticsVLA (Ours)** | **86.7%** | **5.0%** | **No** |
 ## Distillation Approach
@@ -67,13 +67,13 @@ See [Advanced-Robotic-Manipulation/crab](https://github.com/Advanced-Robotic-Man
 - [crab-smolvla-rwfm](https://huggingface.co/armteam/crab-smolvla-rwfm) — SA-RWFM tactile teacher
-## BibTex
 If you use this model, please cite our paper:
 ```bibtex
-@article{gubernatorov2026hapticsvla,
-  title={HapticsVLA: Contact-Rich Manipulation via Vision-Language-Action Model without Inference-Time Tactile Sensing},
   author={Gubernatorov, Konstantin and Sannikov, Mikhail and Mikhalchuk, Ilya and Kuznetsov, Egor and Artemov, Makar and Ouwatobi, Ogunwoye Faith and Fernando, Marcelino and Asanov, Artem and Guo, Ziang and Tsetserukou, Dzmitry},
   journal={arXiv preprint arXiv:2603.15257},
   year={2026}

 base_model: lerobot/smolvla_base
 ---
+# HapticVLA — Contact-Rich Manipulation without Inference-Time Tactile Sensing
 Distilled from a tactile-conditioned [SA-RWFM teacher](https://huggingface.co/armteam/crab-smolvla-rwfm) via offline action-level knowledge distillation. At inference, this model is an **unmodified SmolVLA** — no tactile sensors, no extra modules, zero overhead.
 ## Key Result
+HapticVLA achieves **86.7% mean success rate** (vs. 61.7% baseline, 75.0% tactile teacher) with the **lowest force error rate of any model (5.0%)**, including the tactile-equipped teacher — all without requiring tactile sensors at inference.
 ## Model Details
 |-------|-------------|-----------------|-----------------|
 | SmolVLA (Baseline) | 61.7% | 26.7% | No |
 | SA-RWFM (Teacher) | 75.0% | 11.7% | **Yes** |
+| **HapticVLA (Ours)** | **86.7%** | **5.0%** | **No** |
 ## Distillation Approach
 - [crab-smolvla-rwfm](https://huggingface.co/armteam/crab-smolvla-rwfm) — SA-RWFM tactile teacher
+## Citation
 If you use this model, please cite our paper:
 ```bibtex
+@article{gubernatorov2026hapticvla,
+  title={HapticVLA: Contact-Rich Manipulation via Vision-Language-Action Model without Inference-Time Tactile Sensing},
   author={Gubernatorov, Konstantin and Sannikov, Mikhail and Mikhalchuk, Ilya and Kuznetsov, Egor and Artemov, Makar and Ouwatobi, Ogunwoye Faith and Fernando, Marcelino and Asanov, Artem and Guo, Ziang and Tsetserukou, Dzmitry},
   journal={arXiv preprint arXiv:2603.15257},
   year={2026}