armteam
/

crab-smolvla-left-arm

vision-language-action

Model card Files Files and versions

crab-smolvla-left-arm / README.md

kgubernatorov's picture

Update README.md

8c99601 verified about 1 month ago

|

history blame contribute delete

1.63 kB

	---
	license: apache-2.0
	tags:
	- robotics
	- manipulation
	- smolvla
	- vision-language-action
	base_model: lerobot/smolvla_base
	---

	# Crab SmolVLA — Left Arm

	Fine-tuned [SmolVLA](https://huggingface.co/lerobot/smolvla_base) for left-arm (6-DOF) manipulation on the Crab bimanual mobile manipulator.

	## Model Details

	- Base model: `lerobot/smolvla_base` (450M params)
	- Action space: 6-DOF absolute joint positions (indices 0–5)
	- Training data: 27 demonstrations across 3 tasks (eggs, can, waffles)
	- Best validation loss: 0.46
	- Training: 50K steps, RTX 4090, ~3.5 hrs

	## Usage

	```python
	import torch
	checkpoint = torch.load("best/model.pt", map_location="cpu")
	```

	See [Advanced-Robotic-Manipulation/crab](https://github.com/Advanced-Robotic-Manipulation/crab) for full inference pipeline.

	## Training Config

	Training configuration is provided in `config.yaml`. Key settings:
	- Image size: 256×256, 3 cameras
	- Data augmentation: ColorJitter + RandomResizedCrop
	- Optimizer: AdamW, lr=1e-4, weight_decay=0.01
	- Effective batch size: 32 (8 × 4 gradient accumulation)

	## Citation

	If you use this model, please cite our paper:

	```bibtex
	@article{gubernatorov2026hapticvla,
	title={HapticVLA: Contact-Rich Manipulation via Vision-Language-Action Model without Inference-Time Tactile Sensing},
	author={Gubernatorov, Konstantin and Sannikov, Mikhail and Mikhalchuk, Ilya and Kuznetsov, Egor and Artemov, Makar and Ouwatobi, Ogunwoye Faith and Fernando, Marcelino and Asanov, Artem and Guo, Ziang and Tsetserukou, Dzmitry},
	journal={arXiv preprint arXiv:2603.15257},
	year={2026}
	}
	```