Octo-Base Fine-tuned on MuJoCo Franka Reach
This is the fine-tuned Octo checkpoint discussed in the blog post What Transfers When Nothing Matches. Published for reproducibility.
This is not a general-purpose model. It solves a single reaching task with a simulated Franka Panda arm in MuJoCo. It's published so readers can verify the results and experiment with it.
What this is
Octo-Base 1.5 (93M params) fine-tuned on 300 demonstration episodes of a Franka Panda reaching for a green target sphere in MuJoCo simulation.
Result: 90% success (9/10 episodes), ~60 steps average to reach.
Three simultaneous domain gaps from pre-training:
- Visual domain: cartoonish MuJoCo renders instead of real camera images
- Action space: 7 joint angle deltas instead of end-effector deltas (action head replaced entirely, initialized from scratch)
- Data source: demonstrations from a trained SAC policy, not human teleoperation
How to use
Requires the Octo environment (Python 3.10, see repo setup).
from octo.model.octo_model import OctoModel
import numpy as np
# Load fine-tuned model
model = OctoModel.load_pretrained("hf://aryanmadhavverma/octo-franka-reach-finetuned")
# Load action normalization stats (included in this repo)
action_mean = np.load("action_mean.npy") # downloaded with the checkpoint
action_std = np.load("action_std.npy")
# Inference (single step)
actions = model.sample_actions(
observations={
"image_primary": overhead_img[None, None, ...], # (1, 1, 256, 256, 3) uint8
"image_wrist": wrist_img[None, None, ...], # (1, 1, 128, 128, 3) uint8
"timestep_pad_mask": np.array([[True]]),
},
tasks=model.create_tasks(texts=["reach the green target"]),
rng=jax.random.PRNGKey(0),
)
# Denormalize: model output → physical joint deltas (radians)
raw_action = np.array(actions[0, 0]) # first action from 4-action chunk
joint_delta = raw_action * action_std + action_mean
For the full evaluation loop, see vla/eval_finetuned_octo.py.
Training details
| Base model | octo-base-1.5 (93M params) |
| Training data | 300 episodes, ~30K frames, dual camera (overhead 256x256 + wrist 128x128) |
| Data source | Non-deterministic SAC policy (100% success on state-based reaching) |
| Action space | 7 joint angle deltas (replaced pre-trained end-effector head) |
| Steps | 25,000 |
| Batch size | 16 |
| Learning rate | 3e-4 (linear warmup over 100 steps) |
| Optimizer | AdamW |
| Hardware | RTX 4080S (Vast.ai), ~47 minutes |
| Task instruction | "reach the green target" |
Comparison
| method | params | input | success |
|---|---|---|---|
| SAC (model-free RL) | 78K | state vector (20 floats) | 100% |
| Octo zero-shot | 93M | image | 0% |
| Octo fine-tuned (this checkpoint) | 93M | image (dual camera) | 90% |
Links
- Downloads last month
- 8