| --- |
| license: apache-2.0 |
| tags: |
| - robotics |
| - manipulation |
| - smolvla |
| - vision-language-action |
| base_model: lerobot/smolvla_base |
| --- |
| |
| # Crab SmolVLA — Left Arm |
|
|
| Fine-tuned [SmolVLA](https://huggingface.co/lerobot/smolvla_base) for left-arm (6-DOF) manipulation on the Crab bimanual mobile manipulator. |
|
|
| ## Model Details |
|
|
| - **Base model**: `lerobot/smolvla_base` (450M params) |
| - **Action space**: 6-DOF absolute joint positions (indices 0–5) |
| - **Training data**: 27 demonstrations across 3 tasks (eggs, can, waffles) |
| - **Best validation loss**: 0.46 |
| - **Training**: 50K steps, RTX 4090, ~3.5 hrs |
|
|
| ## Usage |
|
|
| ```python |
| import torch |
| checkpoint = torch.load("best/model.pt", map_location="cpu") |
| ``` |
|
|
| See [Advanced-Robotic-Manipulation/crab](https://github.com/Advanced-Robotic-Manipulation/crab) for full inference pipeline. |
|
|
| ## Training Config |
|
|
| Training configuration is provided in `config.yaml`. Key settings: |
| - Image size: 256×256, 3 cameras |
| - Data augmentation: ColorJitter + RandomResizedCrop |
| - Optimizer: AdamW, lr=1e-4, weight_decay=0.01 |
| - Effective batch size: 32 (8 × 4 gradient accumulation) |
| |
| ## Citation |
| |
| If you use this model, please cite our paper: |
| |
| ```bibtex |
| @article{gubernatorov2026hapticvla, |
| title={HapticVLA: Contact-Rich Manipulation via Vision-Language-Action Model without Inference-Time Tactile Sensing}, |
| author={Gubernatorov, Konstantin and Sannikov, Mikhail and Mikhalchuk, Ilya and Kuznetsov, Egor and Artemov, Makar and Ouwatobi, Ogunwoye Faith and Fernando, Marcelino and Asanov, Artem and Guo, Ziang and Tsetserukou, Dzmitry}, |
| journal={arXiv preprint arXiv:2603.15257}, |
| year={2026} |
| } |
| ``` |
| |