YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

ACT Model for Lamp Search Task

ACT (Action Chunking with Transformers) model trained on the auto_lampe_search dataset for robot control.

Model Repository

Repository: kavinrajkrupsurge/act-lampe-movements

Quick Start

Installation from Scratch

1. System Requirements

Python 3.8 or higher
CUDA-capable GPU (recommended) or CPU
At least 8GB RAM
10GB+ free disk space

2. Install Python Dependencies

# Create a virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install PyTorch (choose based on your CUDA version)
# For CUDA 11.8:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# For CUDA 12.1:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# For CPU only:
pip install torch torchvision torchaudio

# Install LeRobot
git clone https://github.com/huggingface/lerobot.git
cd lerobot
pip install -e .

# Install additional dependencies
pip install huggingface_hub pillow numpy pandas tqdm draccus

3. Download Model from Hugging Face

The model will be automatically downloaded when you use the inference script. Alternatively, you can download manually:

pip install huggingface_hub
python -c "from huggingface_hub import snapshot_download; snapshot_download('kavinrajkrupsurge/act-lampe-movements')"

Using the Model

Option 1: Command Line Inference

# Download the inference script from the repository
# Or use the one provided in this repository

python inference_hf.py \
    --repo-id kavinrajkrupsurge/act-lampe-movements \
    --image path/to/your/image.jpg \
    --dataset-root /path/to/dataset  # Optional if you have the dataset

Option 2: Python API

from inference_hf import ACTInference
from PIL import Image
import numpy as np

# Initialize model (will download automatically from Hugging Face)
model = ACTInference(
    repo_id="kavinrajkrupsurge/act-lampe-movements",
    device="cuda"  # or "cpu"
)

# Load image
image = Image.open("path/to/image.jpg")

# Predict action
action = model.predict(
    image=image,
    state=np.array([0.0, 0.0])  # [position, velocity]
)

print(f"Position: {action[0]:.4f}, Velocity: {action[1]:.4f}")

Model Checkpoints

This repository contains multiple checkpoints from training:

checkpoint-001000: Model at 1,000 training steps
checkpoint-002000: Model at 2,000 training steps
checkpoint-003000: Model at 3,000 training steps
checkpoint-004000: Model at 4,000 training steps
checkpoint-005000: Model at 5,000 training steps (final/best)

The main model files in the root directory correspond to checkpoint-005000.

Loading a Specific Checkpoint

from inference_hf import ACTInference
from huggingface_hub import hf_hub_download
import json
import torch
from pathlib import Path

# Download specific checkpoint
checkpoint_path = hf_hub_download(
    repo_id="kavinrajkrupsurge/act-lampe-movements",
    filename="checkpoints/checkpoint-003000/pretrained_model/config.json"
)
# Use the checkpoint path with ACTInference or modify the code to load from local path

Replicating Training from Scratch

Step 1: Prepare Dataset

Download or prepare your dataset in the following structure:

auto_lampe_search/
├── auto_lampe_search_0/
│   ├── frames/
│   │   ├── frame_0000.jpg
│   │   ├── frame_0001.jpg
│   │   └── ...
│   ├── joint_trajectory.csv
│   └── metadata.json
├── auto_lampe_search_1/
│   └── ...
└── ...

Dataset format requirements:
- Each episode directory: auto_lampe_search_{N}/
- frames/: Directory with images named frame_{XXXX}.jpg
- joint_trajectory.csv: CSV with columns base_joint, velocity (and optionally joint2, joint3, joint4)
- metadata.json: JSON with episode metadata

Step 2: Convert Dataset to LeRobot Format

# Clone the repository or download convert_to_lerobot.py
python convert_to_lerobot.py

This will create a LeRobot-compatible dataset at /workspace/lerobot_dataset.

Step 3: Train the Model

# Install LeRobot first (see installation section above)
cd lerobot

python -m lerobot.scripts.lerobot_train \
    --dataset.repo_id=local/auto_lampe_search \
    --dataset.root=/path/to/lerobot_dataset \
    --policy.type=act \
    --policy.push_to_hub=false \
    --output_dir=/path/to/outputs/act_lampe_search \
    --job_name=act_lampe_search \
    --policy.device=cuda \
    --batch_size=16 \
    --steps=5000 \
    --policy.optimizer_lr=1e-4 \
    --policy.optimizer_weight_decay=1e-2 \
    --policy.chunk_size=20 \
    --policy.n_action_steps=20 \
    --policy.n_obs_steps=1 \
    --policy.dropout=0.3 \
    --policy.dim_model=256 \
    --policy.dim_feedforward=1024 \
    --policy.n_encoder_layers=2 \
    --policy.n_decoder_layers=1 \
    --policy.use_vae=false \
    --save_freq=1000 \
    --wandb.enable=false

Step 4: Push to Hugging Face

python push_to_huggingface.py

Follow the prompts to:

Enter your Hugging Face access token
Enter your username
Enter repository name

The script will upload all checkpoints automatically.

Model Architecture

Type: ACT (Action Chunking with Transformers)
Vision Backbone: ResNet18
Model Dimension: 256
Encoder Layers: 2
Decoder Layers: 1
Chunk Size: 20
Action Steps: 20
Dropout: 0.3

Input/Output Specifications

Input

State: 2D vector [position, velocity]
Images: RGB camera images, resized to 320x240

Output

Action: 2D vector [position, velocity]

Training Configuration

Learning Rate: 1e-4
Weight Decay: 1e-2
Batch Size: 16
Training Steps: 5,000
Optimizer: AdamW
Device: CUDA

File Structure

act-lampe-movements/
├── README.md                    # This file
├── inference_hf.py              # Inference script
├── config.json                  # Model configuration (checkpoint-005000)
├── model.safetensors            # Model weights (checkpoint-005000)
├── policy_preprocessor.json     # Preprocessor config
├── policy_postprocessor.json    # Postprocessor config
├── train_config.json            # Training configuration
└── checkpoints/                 # All training checkpoints
    ├── checkpoint-001000/
    │   └── pretrained_model/
    ├── checkpoint-002000/
    │   └── pretrained_model/
    ├── checkpoint-003000/
    │   └── pretrained_model/
    ├── checkpoint-004000/
    │   └── pretrained_model/
    └── checkpoint-005000/
        └── pretrained_model/

Dependencies

Required Packages

torch>=2.0.0
torchvision>=0.15.0
lerobot
huggingface_hub
pillow
numpy
pandas
tqdm
draccus

Installation Command

pip install torch torchvision torchaudio huggingface_hub pillow numpy pandas tqdm draccus
pip install -e lerobot  # After cloning lerobot repository

Troubleshooting

CUDA Out of Memory

If you encounter CUDA OOM errors:

Reduce batch size: --batch_size=8 or --batch_size=4
Use CPU: --policy.device=cpu (slower but uses less memory)
Use a smaller model: reduce --policy.dim_model and --policy.dim_feedforward

Dataset Not Found

Ensure:

Dataset is in the correct format (see Step 1 above)
Paths in convert_to_lerobot.py are correct
LeRobot dataset was created successfully

Model Download Issues

If model download fails:

Check internet connection
Verify repository ID is correct
Ensure you have huggingface_hub installed: pip install huggingface_hub
Try logging in: huggingface-cli login

Inference Errors

Common issues:

Image size mismatch: Images are automatically resized to 320x240
State dimension: Must be exactly 2D [position, velocity]
Device mismatch: Ensure CUDA is available if using device="cuda"

Citation

If you use this model, please cite:

@misc{act-lampe-movements,
  author = {Your Name},
  title = {ACT Model for Lamp Search Task},
  year = {2024},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/kavinrajkrupsurge/act-lampe-movements}}
}

License

Apache 2.0

Contact

For questions or issues, please open an issue on the Hugging Face model repository.

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support