YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
ACT Model for Lamp Search Task
ACT (Action Chunking with Transformers) model trained on the auto_lampe_search dataset for robot control.
Model Repository
Repository: kavinrajkrupsurge/act-lampe-movements
Quick Start
Installation from Scratch
1. System Requirements
- Python 3.8 or higher
- CUDA-capable GPU (recommended) or CPU
- At least 8GB RAM
- 10GB+ free disk space
2. Install Python Dependencies
# Create a virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install PyTorch (choose based on your CUDA version)
# For CUDA 11.8:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# For CUDA 12.1:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# For CPU only:
pip install torch torchvision torchaudio
# Install LeRobot
git clone https://github.com/huggingface/lerobot.git
cd lerobot
pip install -e .
# Install additional dependencies
pip install huggingface_hub pillow numpy pandas tqdm draccus
3. Download Model from Hugging Face
The model will be automatically downloaded when you use the inference script. Alternatively, you can download manually:
pip install huggingface_hub
python -c "from huggingface_hub import snapshot_download; snapshot_download('kavinrajkrupsurge/act-lampe-movements')"
Using the Model
Option 1: Command Line Inference
# Download the inference script from the repository
# Or use the one provided in this repository
python inference_hf.py \
--repo-id kavinrajkrupsurge/act-lampe-movements \
--image path/to/your/image.jpg \
--dataset-root /path/to/dataset # Optional if you have the dataset
Option 2: Python API
from inference_hf import ACTInference
from PIL import Image
import numpy as np
# Initialize model (will download automatically from Hugging Face)
model = ACTInference(
repo_id="kavinrajkrupsurge/act-lampe-movements",
device="cuda" # or "cpu"
)
# Load image
image = Image.open("path/to/image.jpg")
# Predict action
action = model.predict(
image=image,
state=np.array([0.0, 0.0]) # [position, velocity]
)
print(f"Position: {action[0]:.4f}, Velocity: {action[1]:.4f}")
Model Checkpoints
This repository contains multiple checkpoints from training:
- checkpoint-001000: Model at 1,000 training steps
- checkpoint-002000: Model at 2,000 training steps
- checkpoint-003000: Model at 3,000 training steps
- checkpoint-004000: Model at 4,000 training steps
- checkpoint-005000: Model at 5,000 training steps (final/best)
The main model files in the root directory correspond to checkpoint-005000.
Loading a Specific Checkpoint
from inference_hf import ACTInference
from huggingface_hub import hf_hub_download
import json
import torch
from pathlib import Path
# Download specific checkpoint
checkpoint_path = hf_hub_download(
repo_id="kavinrajkrupsurge/act-lampe-movements",
filename="checkpoints/checkpoint-003000/pretrained_model/config.json"
)
# Use the checkpoint path with ACTInference or modify the code to load from local path
Replicating Training from Scratch
Step 1: Prepare Dataset
- Download or prepare your dataset in the following structure:
auto_lampe_search/
βββ auto_lampe_search_0/
β βββ frames/
β β βββ frame_0000.jpg
β β βββ frame_0001.jpg
β β βββ ...
β βββ joint_trajectory.csv
β βββ metadata.json
βββ auto_lampe_search_1/
β βββ ...
βββ ...
- Dataset format requirements:
- Each episode directory:
auto_lampe_search_{N}/ frames/: Directory with images namedframe_{XXXX}.jpgjoint_trajectory.csv: CSV with columnsbase_joint,velocity(and optionally joint2, joint3, joint4)metadata.json: JSON with episode metadata
- Each episode directory:
Step 2: Convert Dataset to LeRobot Format
# Clone the repository or download convert_to_lerobot.py
python convert_to_lerobot.py
This will create a LeRobot-compatible dataset at /workspace/lerobot_dataset.
Step 3: Train the Model
# Install LeRobot first (see installation section above)
cd lerobot
python -m lerobot.scripts.lerobot_train \
--dataset.repo_id=local/auto_lampe_search \
--dataset.root=/path/to/lerobot_dataset \
--policy.type=act \
--policy.push_to_hub=false \
--output_dir=/path/to/outputs/act_lampe_search \
--job_name=act_lampe_search \
--policy.device=cuda \
--batch_size=16 \
--steps=5000 \
--policy.optimizer_lr=1e-4 \
--policy.optimizer_weight_decay=1e-2 \
--policy.chunk_size=20 \
--policy.n_action_steps=20 \
--policy.n_obs_steps=1 \
--policy.dropout=0.3 \
--policy.dim_model=256 \
--policy.dim_feedforward=1024 \
--policy.n_encoder_layers=2 \
--policy.n_decoder_layers=1 \
--policy.use_vae=false \
--save_freq=1000 \
--wandb.enable=false
Step 4: Push to Hugging Face
python push_to_huggingface.py
Follow the prompts to:
- Enter your Hugging Face access token
- Enter your username
- Enter repository name
The script will upload all checkpoints automatically.
Model Architecture
- Type: ACT (Action Chunking with Transformers)
- Vision Backbone: ResNet18
- Model Dimension: 256
- Encoder Layers: 2
- Decoder Layers: 1
- Chunk Size: 20
- Action Steps: 20
- Dropout: 0.3
Input/Output Specifications
Input
- State: 2D vector
[position, velocity] - Images: RGB camera images, resized to 320x240
Output
- Action: 2D vector
[position, velocity]
Training Configuration
- Learning Rate: 1e-4
- Weight Decay: 1e-2
- Batch Size: 16
- Training Steps: 5,000
- Optimizer: AdamW
- Device: CUDA
File Structure
act-lampe-movements/
βββ README.md # This file
βββ inference_hf.py # Inference script
βββ config.json # Model configuration (checkpoint-005000)
βββ model.safetensors # Model weights (checkpoint-005000)
βββ policy_preprocessor.json # Preprocessor config
βββ policy_postprocessor.json # Postprocessor config
βββ train_config.json # Training configuration
βββ checkpoints/ # All training checkpoints
βββ checkpoint-001000/
β βββ pretrained_model/
βββ checkpoint-002000/
β βββ pretrained_model/
βββ checkpoint-003000/
β βββ pretrained_model/
βββ checkpoint-004000/
β βββ pretrained_model/
βββ checkpoint-005000/
βββ pretrained_model/
Dependencies
Required Packages
torch>=2.0.0
torchvision>=0.15.0
lerobot
huggingface_hub
pillow
numpy
pandas
tqdm
draccus
Installation Command
pip install torch torchvision torchaudio huggingface_hub pillow numpy pandas tqdm draccus
pip install -e lerobot # After cloning lerobot repository
Troubleshooting
CUDA Out of Memory
If you encounter CUDA OOM errors:
- Reduce batch size:
--batch_size=8or--batch_size=4 - Use CPU:
--policy.device=cpu(slower but uses less memory) - Use a smaller model: reduce
--policy.dim_modeland--policy.dim_feedforward
Dataset Not Found
Ensure:
- Dataset is in the correct format (see Step 1 above)
- Paths in
convert_to_lerobot.pyare correct - LeRobot dataset was created successfully
Model Download Issues
If model download fails:
- Check internet connection
- Verify repository ID is correct
- Ensure you have
huggingface_hubinstalled:pip install huggingface_hub - Try logging in:
huggingface-cli login
Inference Errors
Common issues:
- Image size mismatch: Images are automatically resized to 320x240
- State dimension: Must be exactly 2D
[position, velocity] - Device mismatch: Ensure CUDA is available if using
device="cuda"
Citation
If you use this model, please cite:
@misc{act-lampe-movements,
author = {Your Name},
title = {ACT Model for Lamp Search Task},
year = {2024},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/kavinrajkrupsurge/act-lampe-movements}}
}
License
Apache 2.0
Contact
For questions or issues, please open an issue on the Hugging Face model repository.
- Downloads last month
- -