RECAP Value Network
A distributional value network trained with the RECAP (Reinforcement Learning from Corrective Actions and Preferences) framework on the jackvial/so101_pickplace_recap_merged_v2 dataset.
This model predicts per-frame normalized expected returns for robot manipulation trajectories, discretized into 50 bins over [-1.0, 0.0].
Model Details
| Parameter | Value |
|---|---|
| Backbone | PaliGemma (gemma_2b) |
| VLM Layers | 10 |
| Precision | bfloat16 |
| Image Size | 224x224 |
| Value Bins | 50 |
| Value Range | [-1.0, 0.0] |
| Value Head Depth | 1 |
| Hidden Dim | 768 |
| Pretrained From | lerobot/pi05_base |
Training Metrics (Epoch 1)
| Metric | Train | Val |
|---|---|---|
| Loss | 1.8568 | 1.5731 |
| Bin Accuracy | 0.5236 | 0.5431 |
| Value MAE | 0.08194 | 0.04085 |
Usage
Loading the model
import json
import torch
from safetensors.torch import load_file
from huggingface_hub import hf_hub_download
from lerobot.rl.algorithms.recap_value_network import (
RECAPValueNetwork,
RECAPValueNetworkConfig,
)
# Download and load config
config_path = hf_hub_download("jackvial/recap-value-network-so101-pickplace", "config.json")
with open(config_path) as f:
config_dict = json.load(f)
# Reconstruct the config (filter to only RECAPValueNetworkConfig fields)
from dataclasses import fields as dc_fields
valid_keys = {f.name for f in dc_fields(RECAPValueNetworkConfig)}
config = RECAPValueNetworkConfig(
**{k: v for k, v in config_dict.items() if k in valid_keys}
)
# Don't re-download pretrained VLM weights during loading
config.pretrained_path = None
# Build model and load weights
model = RECAPValueNetwork(config)
weights_path = hf_hub_download("jackvial/recap-value-network-so101-pickplace", "model.safetensors")
state_dict = load_file(weights_path)
model.load_state_dict(state_dict)
model.eval()
Using as a value network for RECAP pi-star training
uv run python -m lerobot.rl.algorithms.recap_train_pi_star \
--repo_id="jackvial/so101_pickplace_recap_merged_v2" \
--output_dir="outputs/recap_pistar" \
--value_network_checkpoint="jackvial/recap-value-network-so101-pickplace"
Training Configuration
Trained with lerobot.rl.algorithms.recap_train_value_network:
{
"repo_id": "jackvial/so101_pickplace_recap_merged_v2",
"output_dir": "/home/jack/code/lerobot/outputs/recap_value_5",
"labels_csv_path": null,
"root": "/home/jack/.cache/huggingface/lerobot",
"revision": null,
"episodes": null,
"epochs": 2,
"batch_size": 4,
"gradient_accumulation_steps": 4,
"num_workers": 4,
"learning_rate": 0.0001,
"weight_decay": 0.0001,
"warmup_ratio": 0.05,
"max_grad_norm": 1.0,
"val_split_ratio": 0.1,
"seed": 42,
"device": "auto",
"max_train_steps_per_epoch": null,
"max_val_steps_per_epoch": null,
"log_every_n_steps": 100,
"validate_every_n_train_steps": 50,
"plot_every_n_train_steps": 200,
"max_val_steps_per_step_validation": 20,
"val_plot_num_episodes": 4,
"val_plot_num_frames": 8,
"val_plot_every_n_epochs": 1,
"c_fail": 500.0,
"num_value_bins": 50,
"tokenizer_max_length": 200,
"image_size": 224,
"paligemma_variant": "gemma_2b",
"tokenizer_name": "google/paligemma-3b-pt-224",
"model_precision": "bfloat16",
"freeze_vision_encoder": true,
"freeze_backbone": true,
"num_unfrozen_backbone_layers": 3,
"num_vlm_layers": 10,
"value_head_depth": 1,
"dropout": 0.1,
"pretrained_path": "lerobot/pi05_base",
"wandb_project": null,
"wandb_entity": null,
"wandb_run_name": null
}
- Downloads last month
- 20