kgubernatorov's picture
Update README.md
8c99601 verified
---
license: apache-2.0
tags:
- robotics
- manipulation
- smolvla
- vision-language-action
base_model: lerobot/smolvla_base
---
# Crab SmolVLA — Left Arm
Fine-tuned [SmolVLA](https://huggingface.co/lerobot/smolvla_base) for left-arm (6-DOF) manipulation on the Crab bimanual mobile manipulator.
## Model Details
- **Base model**: `lerobot/smolvla_base` (450M params)
- **Action space**: 6-DOF absolute joint positions (indices 0–5)
- **Training data**: 27 demonstrations across 3 tasks (eggs, can, waffles)
- **Best validation loss**: 0.46
- **Training**: 50K steps, RTX 4090, ~3.5 hrs
## Usage
```python
import torch
checkpoint = torch.load("best/model.pt", map_location="cpu")
```
See [Advanced-Robotic-Manipulation/crab](https://github.com/Advanced-Robotic-Manipulation/crab) for full inference pipeline.
## Training Config
Training configuration is provided in `config.yaml`. Key settings:
- Image size: 256×256, 3 cameras
- Data augmentation: ColorJitter + RandomResizedCrop
- Optimizer: AdamW, lr=1e-4, weight_decay=0.01
- Effective batch size: 32 (8 × 4 gradient accumulation)
## Citation
If you use this model, please cite our paper:
```bibtex
@article{gubernatorov2026hapticvla,
title={HapticVLA: Contact-Rich Manipulation via Vision-Language-Action Model without Inference-Time Tactile Sensing},
author={Gubernatorov, Konstantin and Sannikov, Mikhail and Mikhalchuk, Ilya and Kuznetsov, Egor and Artemov, Makar and Ouwatobi, Ogunwoye Faith and Fernando, Marcelino and Asanov, Artem and Guo, Ziang and Tsetserukou, Dzmitry},
journal={arXiv preprint arXiv:2603.15257},
year={2026}
}
```