RECAP Value Network

A distributional value network trained with the RECAP (Reinforcement Learning from Corrective Actions and Preferences) framework on the jackvial/so101_pickplace_recap_merged_v2 dataset.

This model predicts per-frame normalized expected returns for robot manipulation trajectories, discretized into 50 bins over [-1.0, 0.0].

Model Details

Parameter Value
Backbone PaliGemma (gemma_2b)
VLM Layers 10
Precision bfloat16
Image Size 224x224
Value Bins 50
Value Range [-1.0, 0.0]
Value Head Depth 1
Hidden Dim 768
Pretrained From lerobot/pi05_base

Training Metrics (Epoch 1)

Metric Train Val
Loss 1.8568 1.5731
Bin Accuracy 0.5236 0.5431
Value MAE 0.08194 0.04085

Usage

Loading the model

import json
import torch
from safetensors.torch import load_file
from huggingface_hub import hf_hub_download

from lerobot.rl.algorithms.recap_value_network import (
    RECAPValueNetwork,
    RECAPValueNetworkConfig,
)

# Download and load config
config_path = hf_hub_download("jackvial/recap-value-network-so101-pickplace", "config.json")
with open(config_path) as f:
    config_dict = json.load(f)

# Reconstruct the config (filter to only RECAPValueNetworkConfig fields)
from dataclasses import fields as dc_fields
valid_keys = {f.name for f in dc_fields(RECAPValueNetworkConfig)}
config = RECAPValueNetworkConfig(
    **{k: v for k, v in config_dict.items() if k in valid_keys}
)
# Don't re-download pretrained VLM weights during loading
config.pretrained_path = None

# Build model and load weights
model = RECAPValueNetwork(config)
weights_path = hf_hub_download("jackvial/recap-value-network-so101-pickplace", "model.safetensors")
state_dict = load_file(weights_path)
model.load_state_dict(state_dict)
model.eval()

Using as a value network for RECAP pi-star training

uv run python -m lerobot.rl.algorithms.recap_train_pi_star \
  --repo_id="jackvial/so101_pickplace_recap_merged_v2" \
  --output_dir="outputs/recap_pistar" \
  --value_network_checkpoint="jackvial/recap-value-network-so101-pickplace"

Training Configuration

Trained with lerobot.rl.algorithms.recap_train_value_network:

{
  "repo_id": "jackvial/so101_pickplace_recap_merged_v2",
  "output_dir": "/home/jack/code/lerobot/outputs/recap_value_5",
  "labels_csv_path": null,
  "root": "/home/jack/.cache/huggingface/lerobot",
  "revision": null,
  "episodes": null,
  "epochs": 2,
  "batch_size": 4,
  "gradient_accumulation_steps": 4,
  "num_workers": 4,
  "learning_rate": 0.0001,
  "weight_decay": 0.0001,
  "warmup_ratio": 0.05,
  "max_grad_norm": 1.0,
  "val_split_ratio": 0.1,
  "seed": 42,
  "device": "auto",
  "max_train_steps_per_epoch": null,
  "max_val_steps_per_epoch": null,
  "log_every_n_steps": 100,
  "validate_every_n_train_steps": 50,
  "plot_every_n_train_steps": 200,
  "max_val_steps_per_step_validation": 20,
  "val_plot_num_episodes": 4,
  "val_plot_num_frames": 8,
  "val_plot_every_n_epochs": 1,
  "c_fail": 500.0,
  "num_value_bins": 50,
  "tokenizer_max_length": 200,
  "image_size": 224,
  "paligemma_variant": "gemma_2b",
  "tokenizer_name": "google/paligemma-3b-pt-224",
  "model_precision": "bfloat16",
  "freeze_vision_encoder": true,
  "freeze_backbone": true,
  "num_unfrozen_backbone_layers": 3,
  "num_vlm_layers": 10,
  "value_head_depth": 1,
  "dropout": 0.1,
  "pretrained_path": "lerobot/pi05_base",
  "wandb_project": null,
  "wandb_entity": null,
  "wandb_run_name": null
}
Downloads last month
20
Video Preview
loading

Dataset used to train jackvial/recap-value-network-so101-pickplace