Octo-Base Fine-tuned on MuJoCo Franka Reach

This is the fine-tuned Octo checkpoint discussed in the blog post What Transfers When Nothing Matches. Published for reproducibility.

This is not a general-purpose model. It solves a single reaching task with a simulated Franka Panda arm in MuJoCo. It's published so readers can verify the results and experiment with it.

What this is

Octo-Base 1.5 (93M params) fine-tuned on 300 demonstration episodes of a Franka Panda reaching for a green target sphere in MuJoCo simulation.

Result: 90% success (9/10 episodes), ~60 steps average to reach.

Three simultaneous domain gaps from pre-training:

Visual domain: cartoonish MuJoCo renders instead of real camera images
Action space: 7 joint angle deltas instead of end-effector deltas (action head replaced entirely, initialized from scratch)
Data source: demonstrations from a trained SAC policy, not human teleoperation

How to use

Requires the Octo environment (Python 3.10, see repo setup).

from octo.model.octo_model import OctoModel
import numpy as np

# Load fine-tuned model
model = OctoModel.load_pretrained("hf://aryanmadhavverma/octo-franka-reach-finetuned")

# Load action normalization stats (included in this repo)
action_mean = np.load("action_mean.npy")  # downloaded with the checkpoint
action_std = np.load("action_std.npy")

# Inference (single step)
actions = model.sample_actions(
    observations={
        "image_primary": overhead_img[None, None, ...],   # (1, 1, 256, 256, 3) uint8
        "image_wrist": wrist_img[None, None, ...],        # (1, 1, 128, 128, 3) uint8
        "timestep_pad_mask": np.array([[True]]),
    },
    tasks=model.create_tasks(texts=["reach the green target"]),
    rng=jax.random.PRNGKey(0),
)

# Denormalize: model output → physical joint deltas (radians)
raw_action = np.array(actions[0, 0])        # first action from 4-action chunk
joint_delta = raw_action * action_std + action_mean

For the full evaluation loop, see vla/eval_finetuned_octo.py.

Training details


Base model	octo-base-1.5 (93M params)
Training data	300 episodes, ~30K frames, dual camera (overhead 256x256 + wrist 128x128)
Data source	Non-deterministic SAC policy (100% success on state-based reaching)
Action space	7 joint angle deltas (replaced pre-trained end-effector head)
Steps	25,000
Batch size	16
Learning rate	3e-4 (linear warmup over 100 steps)
Optimizer	AdamW
Hardware	RTX 4080S (Vast.ai), ~47 minutes
Task instruction	"reach the green target"

Comparison

method	params	input	success
SAC (model-free RL)	78K	state vector (20 floats)	100%
Octo zero-shot	93M	image	0%
Octo fine-tuned (this checkpoint)	93M	image (dual camera)	90%

aryanmadhavverma
/

octo-franka-reach-finetuned

Octo-Base Fine-tuned on MuJoCo Franka Reach

What this is

How to use

Training details

Comparison

Links