TTI / Reward /robometer /FINETUNE_ROBOMETER.md
JosephBai's picture
Upload folder using huggingface_hub
857c2e9 verified

Fine-tuning Robometer on Your Own Data

Preprocess a dataset, LoRA fine-tune from Robometer-4B, upload to the Hub, run inference. Example: MINT-SJTU/RoboFAC-dataset.


1. Preprocessing (RoboFAC)

Training needs a preprocessed cache from a HuggingFace dataset in RBM format. RoboFAC is folder-based; convert first, then preprocess.

  1. Download (into ROBOMETER_DATASET_PATH):

    export ROBOMETER_DATASET_PATH=/path/to/your/robometer_dataset
    huggingface-cli download MINT-SJTU/RoboFAC-dataset --local-dir $ROBOMETER_DATASET_PATH/RoboFAC-dataset
    
  2. Convert and push to Hub (required for recommended flow):

    export HF_TOKEN=your_token_here
    
    uv run python -m dataset_upload.generate_hf_dataset \
      --config_path dataset_upload/configs/data_gen_configs/robofac.yaml \
      --dataset.dataset_path=$ROBOMETER_DATASET_PATH/RoboFAC-dataset \
      --hub.push_to_hub=true \
      --hub.hub_repo_id=robofac_rbm
    
  3. If you pushed: download converted repo and set preprocess config:

    huggingface-cli download aliangdw/robofac_rbm --local-dir $ROBOMETER_DATASET_PATH/robofac_rbm
    

    In preprocess_finetune.yaml: train_datasets: ["aliangdw/robofac_rbm"], train_subsets: [["robofac"]].

  4. Preprocess:

    export ROBOMETER_PROCESSED_DATASETS_PATH=/path/to/save/processed_datasets
    
    uv run python -m robometer.data.scripts.preprocess_datasets \
      --config robometer/configs/preprocess_finetune.yaml \
      --cache_dir=$ROBOMETER_PROCESSED_DATASETS_PATH
    

    Use the same path for training.

Other datasets: RBM-style Hub datasets: set train_datasets/train_subsets and ROBOMETER_DATASET_PATH in the preprocess config, then run step 4. Raw data: add a loader (see CustomDataset.md).


2. LoRA fine-tuning

Use PEFT (LoRA) and load_from_checkpoint from a Qwen3-4B–based RBM checkpoint.

export ROBOMETER_PROCESSED_DATASETS_PATH=/path/to/your/processed_datasets

uv run python train.py \
  model.base_model_id=Qwen/Qwen3-VL-4B-Instruct \
  model.use_peft=true \
  model.train_progress_head=true \
  model.train_preference_head=true \
  data.train_datasets=[aliangdw_robofac_rbm_robofac] \
  data.eval_datasets=[mw] \
  training.load_from_checkpoint=aliangdw/Robometer-4B \
  training.per_device_train_batch_size=8 \
  training.learning_rate=2e-5 \
  training.warmup_ratio=0.1 \
  training.weight_decay=0.01 \
  training.max_steps=1000 \
  training.output_dir=./logs \
  training.exp_name=robometer4b_lora_robofac_2 \
  logging.log_to=[wandb] \
  custom_eval.eval_types=[reward_alignment,policy_ranking] \
  custom_eval.reward_alignment=[aliangdw_robofac_rbm_robofac] \
  custom_eval.policy_ranking=[aliangdw_robofac_rbm_robofac] \
  logging.save_best.metric_names=[eval_rew_align/pearson_robofac,eval_p_rank/kendall_last_robofac] \
  logging.save_best.greater_is_better=[true,true] \
  training.overwrite_output_dir=True \
  training.eval_steps=50 \
  training.custom_eval_steps=50

The short name robofac is defined in name_mapping.py for aliangdw_robofac_rbm_robofac.

Tunable LoRA / training: Override training.learning_rate, training.warmup_ratio, training.weight_decay, training.gradient_accumulation_steps, or training.max_steps as needed. Defaults above match robometer/configs/config.yaml. Multi-GPU: uv run accelerate launch --config_file robometer/configs/distributed/fsdp.yaml train.py ... (same overrides).

Full fine-tuning (no PEFT)

Load the same checkpoint but train the full model (no LoRA). Uses more memory; lower per_device_train_batch_size or gradient accumulation if needed.

export ROBOMETER_PROCESSED_DATASETS_PATH=/path/to/your/processed_datasets

uv run python train.py \
  model.base_model_id=Qwen/Qwen3-VL-4B-Instruct \
  model.use_peft=false \
  model.train_progress_head=true \
  model.train_preference_head=true \
  data.train_datasets=[aliangdw_robofac_rbm_robofac] \
  data.eval_datasets=[aliangdw_robofac_rbm_robofac] \
  training.load_from_checkpoint=aliangdw/Robometer-4B \
  training.per_device_train_batch_size=8 \
  training.learning_rate=2e-5 \
  training.warmup_ratio=0.1 \
  training.weight_decay=0.01 \
  training.gradient_accumulation_steps=1 \
  training.max_steps=500 \
  training.output_dir=./logs \
  training.exp_name=robometer4b_full_robofac \
  logging.log_to=[wandb] \
  custom_eval.reward_alignment=[aliangdw_robofac_rbm_robofac] \
  custom_eval.policy_ranking=[aliangdw_robofac_rbm_robofac] \
  logging.save_best.metric_names=[eval_rew_align/pearson_robofac,eval_p_rank/kendall_last_robofac] \
  logging.save_best.greater_is_better=[true,true] \
  training.eval_steps=50 \
  training.custom_eval_steps=50

3. Upload to Hub

uv run python robometer/utils/upload_to_hub.py \
  --model_dir ./logs/robometer4b_lora_robofac/checkpoint-500 \
  --hub_model_id aliangdw/robometer-4b-lora-robofac \
  --base_model "Qwen/Qwen3-VL-4B-Instruct" \
  --commit_message "LoRA fine-tune on RoboFAC"

Or enable logging.save_best.upload_to_hub: true in config for upload during training.


4. Inference

uv run python scripts/example_inference_local.py \
  --model-path aliangdw/robometer-4b-lora-robofac \
  --video /path/to/video.mp4 \
  --task "Insert the cylinder"

Server: uv run python robometer/evals/eval_server.py ... model_path=aliangdw/robometer-4b-lora-robofac. Eval: run_baseline_eval.py with reward_model=rbm, model_path=... (see README).


5. Baseline: Fine-tune from base Qwen-VL (no Robometer checkpoint)

For comparison, run the same train.py on the same data but without loading a Robometer checkpoint. Training starts from the base Qwen-VL plus randomly initialized progress/preference heads.

export ROBOMETER_PROCESSED_DATASETS_PATH=/path/to/your/processed_datasets

uv run python train.py \
  model.base_model_id=Qwen/Qwen3-VL-4B-Instruct \
  model.use_peft=true \
  model.train_progress_head=true \
  model.train_preference_head=true \
  data.train_datasets=[aliangdw_robofac_rbm_robofac] \
  data.eval_datasets=[aliangdw_robofac_rbm_robofac] \
  training.per_device_train_batch_size=8 \
  training.learning_rate=2e-5 \
  training.warmup_ratio=0.1 \
  training.weight_decay=0.01 \
  training.max_steps=1000 \
  training.output_dir=./logs \
  training.exp_name=qwen3vl_lora_robofac_baseline \
  logging.log_to=[wandb] \
  custom_eval.reward_alignment=[aliangdw_robofac_rbm_robofac] \
  custom_eval.policy_ranking=[aliangdw_robofac_rbm_robofac] \
  logging.save_best.metric_names=[eval_rew_align/pearson_robofac,eval_p_rank/kendall_last_robofac] \
  logging.save_best.greater_is_better=[true,true] \
  training.eval_steps=50 \
  training.custom_eval_steps=50

uv run python train.py \
  model.base_model_id=Qwen/Qwen3-VL-4B-Instruct \
  model.use_peft=false \
  model.train_progress_head=true \
  model.train_preference_head=true \
  data.train_datasets=[aliangdw_robofac_rbm_robofac] \
  data.eval_datasets=[aliangdw_robofac_rbm_robofac] \
  training.per_device_train_batch_size=8 \
  training.learning_rate=2e-5 \
  training.warmup_ratio=0.1 \
  training.weight_decay=0.01 \
  training.max_steps=1000 \
  training.output_dir=./logs \
  training.exp_name=qwen3vl_robofac_baseline \
  logging.log_to=[wandb] \
  custom_eval.reward_alignment=[aliangdw_robofac_rbm_robofac] \
  custom_eval.policy_ranking=[aliangdw_robofac_rbm_robofac] \
  logging.save_best.metric_names=[eval_rew_align/pearson_robofac,eval_p_rank/kendall_last_robofac] \
  logging.save_best.greater_is_better=[true,true] \
  training.eval_steps=50 \
  training.custom_eval_steps=50