Octo-Base Fine-tuned on MuJoCo Franka Reach

This is the fine-tuned Octo checkpoint discussed in the blog post What Transfers When Nothing Matches. Published for reproducibility.

This is not a general-purpose model. It solves a single reaching task with a simulated Franka Panda arm in MuJoCo. It's published so readers can verify the results and experiment with it.

What this is

Octo-Base 1.5 (93M params) fine-tuned on 300 demonstration episodes of a Franka Panda reaching for a green target sphere in MuJoCo simulation.

Result: 90% success (9/10 episodes), ~60 steps average to reach.

Three simultaneous domain gaps from pre-training:

  1. Visual domain: cartoonish MuJoCo renders instead of real camera images
  2. Action space: 7 joint angle deltas instead of end-effector deltas (action head replaced entirely, initialized from scratch)
  3. Data source: demonstrations from a trained SAC policy, not human teleoperation

How to use

Requires the Octo environment (Python 3.10, see repo setup).

from octo.model.octo_model import OctoModel
import numpy as np

# Load fine-tuned model
model = OctoModel.load_pretrained("hf://aryanmadhavverma/octo-franka-reach-finetuned")

# Load action normalization stats (included in this repo)
action_mean = np.load("action_mean.npy")  # downloaded with the checkpoint
action_std = np.load("action_std.npy")

# Inference (single step)
actions = model.sample_actions(
    observations={
        "image_primary": overhead_img[None, None, ...],   # (1, 1, 256, 256, 3) uint8
        "image_wrist": wrist_img[None, None, ...],        # (1, 1, 128, 128, 3) uint8
        "timestep_pad_mask": np.array([[True]]),
    },
    tasks=model.create_tasks(texts=["reach the green target"]),
    rng=jax.random.PRNGKey(0),
)

# Denormalize: model output → physical joint deltas (radians)
raw_action = np.array(actions[0, 0])        # first action from 4-action chunk
joint_delta = raw_action * action_std + action_mean

For the full evaluation loop, see vla/eval_finetuned_octo.py.

Training details

Base model octo-base-1.5 (93M params)
Training data 300 episodes, ~30K frames, dual camera (overhead 256x256 + wrist 128x128)
Data source Non-deterministic SAC policy (100% success on state-based reaching)
Action space 7 joint angle deltas (replaced pre-trained end-effector head)
Steps 25,000
Batch size 16
Learning rate 3e-4 (linear warmup over 100 steps)
Optimizer AdamW
Hardware RTX 4080S (Vast.ai), ~47 minutes
Task instruction "reach the green target"

Comparison

method params input success
SAC (model-free RL) 78K state vector (20 floats) 100%
Octo zero-shot 93M image 0%
Octo fine-tuned (this checkpoint) 93M image (dual camera) 90%

Links

Downloads last month
8
Video Preview
loading