Refined Policy Distillation: From VLA Generalists to RL Experts
Paper • 2503.05833 • Published
This repo contains the Octo weights used in Refined Policy Distillation (RPD). RPD distills VLAs into small expert policies using online Reinforcement Learning.
Paper: Refined Policy Distillation: From VLA Generalists to RL Experts Project Page: https://refined-policy-distillation.github.io Code: https://github.com/Refined-Policy-Distillation/RPD
The dataset used to fine-tune this checkpoint can be found here.
Also checkout the RPD OpenVLA weights.
Adapted from the Octo Repo
from octo.model.octo_model import OctoModel
model = OctoModel.load_pretrained("hf://Juelg/octo-base-1.5-finetuned-maniskill")
task = model.create_tasks(texts=["pick the cube"])
action = model.sample_actions(observation, task, rng=jax.random.PRNGKey(0))
For details on how Octo was used in RPD checkout the RPD Code Repo and the Agents library.
If you find RPD useful for your work, please consider citing it:
@inproceedings{juelg2025refinedpolicydistillationvla,
title={{Refined Policy Distillation}: {F}rom {VLA} Generalists to {RL} Experts},
author={Tobias Jülg and Wolfram Burgard and Florian Walter},
year={2025},
booktitle={Proc.~of the IEEE/RSJ Int.~Conf.~on Intelligent Robots and Systems (IROS)},
note={Accepted for publication.}
}