RT-1 Success Predictor (Qwen3-VL-8B LoRA)

A LoRA adapter finetuned on Qwen3-VL-8B-Instruct to predict whether a robot policy successfully completed a task, given 4 sampled video frames.

This replicates and improves on the VLM judge from the WorldGym paper.

How It Works

The model receives 4 frames (first, two middle, last) extracted from an RT-1 rollout video along with a text prompt describing the task. It outputs <answer>Success</answer> or <answer>Failure</answer>.

Eval Results (n=835)

Model Accuracy TPR TNR FPR FNR
GPT-4o (2024-11-20) 65.9% 85.3% 52.0% 48.0% 14.7%
Qwen3-VL-8B (base) 63.1% 84.5% 47.8% 52.2% 15.5%
Qwen3-VL-8B (finetuned) 87.1% 81.2% 91.3% 8.7% 18.9%

Training Details

  • Base model: Qwen3-VL-8B-Instruct (4-bit quantized)
  • Method: LoRA (r=16, alpha=16) via TRL SFTTrainer + Unsloth
  • Training set: 3,319 samples from RT-1
  • Test set: 835 samples
  • Best checkpoint: Epoch 11 (step 4565)
  • Training regime: bf16
  • Batch size: 2 x 4 gradient accumulation = 8 effective
  • Learning rate: 5e-5 (cosine schedule, 10 warmup steps)
  • Optimizer: AdamW 8-bit
  • Hardware: 2x NVIDIA RTX 3090

Source Code

github.com/loganbolton/WorldEvals

Framework Versions

  • PEFT 0.19.1
  • Transformers 4.57.1
  • Unsloth 2026.3.11
  • TRL (SFTTrainer)
Downloads last month
40
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for loganbolton/RT-1-Success-Predictor-qwen3-vl-8b

Paper for loganbolton/RT-1-Success-Predictor-qwen3-vl-8b