RECAP Value Network

A distributional value network trained with the RECAP (Reinforcement Learning from Corrective Actions and Preferences) framework on the jackvial/so101_pickplace_recap_merged_v2 dataset.

This model predicts per-frame normalized expected returns for robot manipulation trajectories, discretized into 50 bins over [-1.0, 0.0].

Model Details

Parameter	Value
Backbone	PaliGemma (`gemma_2b`)
VLM Layers	10
Precision	`bfloat16`
Image Size	224x224
Value Bins	50
Value Range	[-1.0, 0.0]
Value Head Depth	1
Hidden Dim	768
Pretrained From	`lerobot/pi05_base`

Training Metrics (Epoch 1)

Metric	Train	Val
Loss	1.8568	1.5731
Bin Accuracy	0.5236	0.5431
Value MAE	0.08194	0.04085

Usage

Loading the model

import json
import torch
from safetensors.torch import load_file
from huggingface_hub import hf_hub_download

from lerobot.rl.algorithms.recap_value_network import (
    RECAPValueNetwork,
    RECAPValueNetworkConfig,
)

# Download and load config
config_path = hf_hub_download("jackvial/recap-value-network-so101-pickplace", "config.json")
with open(config_path) as f:
    config_dict = json.load(f)

# Reconstruct the config (filter to only RECAPValueNetworkConfig fields)
from dataclasses import fields as dc_fields
valid_keys = {f.name for f in dc_fields(RECAPValueNetworkConfig)}
config = RECAPValueNetworkConfig(
    **{k: v for k, v in config_dict.items() if k in valid_keys}
)
# Don't re-download pretrained VLM weights during loading
config.pretrained_path = None

# Build model and load weights
model = RECAPValueNetwork(config)
weights_path = hf_hub_download("jackvial/recap-value-network-so101-pickplace", "model.safetensors")
state_dict = load_file(weights_path)
model.load_state_dict(state_dict)
model.eval()

Using as a value network for RECAP pi-star training

uv run python -m lerobot.rl.algorithms.recap_train_pi_star \
  --repo_id="jackvial/so101_pickplace_recap_merged_v2" \
  --output_dir="outputs/recap_pistar" \
  --value_network_checkpoint="jackvial/recap-value-network-so101-pickplace"

Training Configuration

Trained with lerobot.rl.algorithms.recap_train_value_network:

{
  "repo_id": "jackvial/so101_pickplace_recap_merged_v2",
  "output_dir": "/home/jack/code/lerobot/outputs/recap_value_5",
  "labels_csv_path": null,
  "root": "/home/jack/.cache/huggingface/lerobot",
  "revision": null,
  "episodes": null,
  "epochs": 2,
  "batch_size": 4,
  "gradient_accumulation_steps": 4,
  "num_workers": 4,
  "learning_rate": 0.0001,
  "weight_decay": 0.0001,
  "warmup_ratio": 0.05,
  "max_grad_norm": 1.0,
  "val_split_ratio": 0.1,
  "seed": 42,
  "device": "auto",
  "max_train_steps_per_epoch": null,
  "max_val_steps_per_epoch": null,
  "log_every_n_steps": 100,
  "validate_every_n_train_steps": 50,
  "plot_every_n_train_steps": 200,
  "max_val_steps_per_step_validation": 20,
  "val_plot_num_episodes": 4,
  "val_plot_num_frames": 8,
  "val_plot_every_n_epochs": 1,
  "c_fail": 500.0,
  "num_value_bins": 50,
  "tokenizer_max_length": 200,
  "image_size": 224,
  "paligemma_variant": "gemma_2b",
  "tokenizer_name": "google/paligemma-3b-pt-224",
  "model_precision": "bfloat16",
  "freeze_vision_encoder": true,
  "freeze_backbone": true,
  "num_unfrozen_backbone_layers": 3,
  "num_vlm_layers": 10,
  "value_head_depth": 1,
  "dropout": 0.1,
  "pretrained_path": "lerobot/pi05_base",
  "wandb_project": null,
  "wandb_entity": null,
  "wandb_run_name": null
}

Downloads last month: 20

Video Preview

Reinforcement Learning

jackvial
/

recap-value-network-so101-pickplace