FIPER RND-OE Detector — build_block_tower (RL Token, step 9999)

RND-OE (Random Network Distillation — Observation Embedding) novelty detector trained on RL-token embeddings from an OpenPI Pi0-RL policy fine-tuned on the build_block_tower task.

Model Details

Method: RND-OE
Embedding source: RL token from OpenPI Pi0-RL (pi0.5)
Embedding dim: 2048
RND output dim: 512
OpenPI config: pi05_rl_token_build_block_tower
OpenPI checkpoint: step 9999 (pi05_rl_token_build_block_tower/rlt_v1/9999)
Dataset: villekuosmanen/build_block_tower (68,997 samples, 100 episodes)
Action space: 7D joint-space
Action horizon: 50

Training Config

Parameter	Value
Batch size	32
Epochs	10
Optimizer	AdamW
Learning rate	1e-4 → 1e-6 (cosine)
Weight decay	1e-5
Validation split	90/10 (random, seed 42)
Calibration quantile	0.95
Threshold window	3

Training Results

Epoch	Train Loss	Val Loss
1	0.000855	0.000398
2	0.000328	0.000278
3	0.000241	0.000215
4	0.000194	0.000187
5	0.000166	0.000160
6	0.000148	0.000145
7	0.000136	0.000133
8	0.000127	0.000127
9	0.000122	0.000123
10	0.000119	0.000121

Both losses dropped steadily across all 10 epochs with no sign of overfitting; val tracked train closely throughout.

Fitted threshold: 0.000577 (q=0.95, window=3, calibrated over 100 episodes / 62,097 samples)

W&B: training curves

Files

File	Description	SHA256
`best.pt`	Best model checkpoint (epoch 10)	`0f1073103cbefc7ea20ef9f2d58b27b224b7c7ef7555f283b8d4745de114386b`
`latest.pt`	Latest model checkpoint (epoch 10)	`1b80f1c2eac203102b14bb05b31e039713d723148ccf6cdc59a8b8f7d9fcf6e5`
`rnd_oe_detector.pt`	Detector + threshold + calibration	`63e851d9bb680887cdbdfb5ebe1dd541ace9afd6af49a501f20c18ac56441391`
`config.json`	Resolved training config	`cbcd7eaf0c611acb898810025e590b56f9d52bddb669e0e4d69b835b9565af8d`

Verify hashes: sha256sum <file>

Usage

Evaluate on held-out episodes:

python -m python.fiper.evaluate_rnd_oe_batch \
    --detector-path rnd_oe_detector.pt \
    --dataset-repo-id villekuosmanen/build_block_tower \
    --openpi-checkpoint-dir <path_to_openpi_checkpoint>/9999 \
    --openpi-config-name pi05_rl_token_build_block_tower \
    --embedding-variant rl_token \
    --episodes-per-dataset 10 \
    --output-dir eval_output/

Requires the OpenPI Pi0-RL checkpoint and the alpha-robotics repo with FIPER code.

Downloads last month: 11

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support