YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

MiniVLA Fine-tuned on LAMPE 4-DoF Dataset

This repository contains a fine-tuned MiniVLA model trained on the LAMPE dataset with 4-DoF actions (Base, Joint2, Joint3, Joint4).

Model Details

  • Base Model: openvla/openvla-7b
  • Fine-tuning Method: LoRA (Low-Rank Adaptation)
  • LoRA Rank: 32
  • LoRA Dropout: 0.0
  • Training Dataset: LAMPE Combined Dataset (80 trajectories, 6,289 transitions)
  • Action Space: 4-DoF [Base, Joint2, Joint3, Joint4]
  • Training Steps: 3,000
  • Batch Size: 8
  • Learning Rate: 5e-4
  • Image Augmentation: Enabled

Quick Start

Installation

# Install dependencies
pip install torch torchvision transformers peft accelerate
pip install huggingface_hub
pip install git+https://github.com/moojink/dlimp_openvla

Loading the Model

from prismatic.models.load import load_vla
import torch

# Load the base model
model = load_vla(
    "openvla/openvla-7b",
    load_for_training=False,
)

# Load fine-tuned checkpoint
checkpoint = torch.load("checkpoints/step-003000-loss=0.9050.pt", map_location="cpu")
if "model_state_dict" in checkpoint:
    model.load_state_dict(checkpoint["model_state_dict"], strict=False)

# Set to evaluation mode
model.eval()

Inference

from PIL import Image

# Load image and instruction
image = Image.open("path/to/image.jpg")
instruction = "turn left"

# Predict action
with torch.inference_mode():
    action = model.predict_action(
        image=image,
        instruction=instruction,
        unnorm_key="lampe_dataset_combined",
        do_sample=False
    )

print(f"Predicted action: {action}")
# Output: [Base, Joint2, Joint3, Joint4]

Fine-tuning Scripts

This repository includes the fine-tuning script (finetune.py) used to train this model.

Running Fine-tuning

python finetune.py \
    --vla_path "openvla/openvla-7b" \
    --data_root_dir "/path/to/rlds_dataset" \
    --dataset_name "lampe_dataset_combined" \
    --dataset_statistics_path "dataset_statistics.json" \
    --batch_size 8 \
    --max_steps 3000 \
    --learning_rate 5e-4 \
    --lora_rank 32 \
    --lora_dropout 0.0 \
    --wandb_mode "offline"

Or use the provided shell script:

bash finetune_lampe_combined.sh

Validation

Use the validate.py script to validate the model on sample data:

python validate.py

Update the paths in validate.py:

  • CHECKPOINT_DIR: Path to checkpoint directory
  • SAMPLE_PATH: Path to sample data directory

Dataset Statistics

The dataset_statistics.json file contains normalization statistics for the LAMPE dataset:

{
  "lampe_dataset_combined": {
    "action": {
      "mean": [...],
      "std": [...],
      "min": [...],
      "max": [...],
      "q01": [...],
      "q99": [...]
    },
    "proprio": {...},
    "num_transitions": 6289,
    "num_trajectories": 80
  }
}

Model Architecture

  • Vision Backbone: DinoSigLIP (ViT-SO400M-14-SigLIP)
  • LLM Backbone: Qwen2.5-0.5B with extra tokens
  • Action Tokenizer: Standard ActionTokenizer (256 bins, 4-DoF)
  • Image Resolution: 224x224

Key Features

  1. 4-DoF Action Space: Supports Base, Joint2, Joint3, and Joint4 control
  2. LoRA Fine-tuning: Parameter-efficient fine-tuning with only 1.39% trainable parameters
  3. Custom RLDS Dataset Support: Handles custom RLDS datasets not in OXE
  4. FlashAttention2 Disabled: Optimized for 0.5B model without FlashAttention2
  5. VQ Tokenizer Replacement: Automatically replaces VQ tokenizers with standard ones

Files Included

  • finetune.py: Fine-tuning script with custom dataset support
  • validate.py: Validation script for model evaluation
  • finetune_lampe_combined.sh: Shell script for easy fine-tuning
  • dataset_statistics.json: Dataset normalization statistics
  • checkpoints/: Model checkpoints from training
  • adapter-weights/: LoRA adapter weights (if available)

Citation

If you use this model, please cite:

@misc{minivla-lampe-4dof,
  title={MiniVLA Fine-tuned on LAMPE 4-DoF Dataset},
  author={Your Name},
  year={2025},
  url={https://huggingface.co/kavinrajkrupsurge/openvla-lampe-4dof-finetuned}
}

License

This model follows the license of the base model openvla/openvla-7b.

Contact

For questions or issues, please open an issue on the HuggingFace repository.

Downloads last month
1
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support