Update README.md

8c99601 verified about 1 month ago

1.63 kB

license: apache-2.0
tags:
  - robotics
  - manipulation
  - smolvla
  - vision-language-action
base_model: lerobot/smolvla_base

Crab SmolVLA — Left Arm

Fine-tuned SmolVLA for left-arm (6-DOF) manipulation on the Crab bimanual mobile manipulator.

Model Details

Base model: lerobot/smolvla_base (450M params)
Action space: 6-DOF absolute joint positions (indices 0–5)
Training data: 27 demonstrations across 3 tasks (eggs, can, waffles)
Best validation loss: 0.46
Training: 50K steps, RTX 4090, ~3.5 hrs

Usage

import torch
checkpoint = torch.load("best/model.pt", map_location="cpu")

See Advanced-Robotic-Manipulation/crab for full inference pipeline.

Training Config

Training configuration is provided in config.yaml. Key settings:

Image size: 256×256, 3 cameras
Data augmentation: ColorJitter + RandomResizedCrop
Optimizer: AdamW, lr=1e-4, weight_decay=0.01
Effective batch size: 32 (8 × 4 gradient accumulation)

Citation

If you use this model, please cite our paper:

@article{gubernatorov2026hapticvla,
  title={HapticVLA: Contact-Rich Manipulation via Vision-Language-Action Model without Inference-Time Tactile Sensing},
  author={Gubernatorov, Konstantin and Sannikov, Mikhail and Mikhalchuk, Ilya and Kuznetsov, Egor and Artemov, Makar and Ouwatobi, Ogunwoye Faith and Fernando, Marcelino and Asanov, Artem and Guo, Ziang and Tsetserukou, Dzmitry},
  journal={arXiv preprint arXiv:2603.15257},
  year={2026}
}