lsnu's picture
Add files using upload-large-folder tool
bfb9665 verified

Results Raw

This file records exact values and exact partial statuses without additional conclusions.

Proxy Sprint v7 Main Table

Source:

  • VLAarchtests/artifacts/reports/sprint_v7_summary/reveal_sprint_summary_compact.json
Item Raw values
base_model mean success 0.28; foliage 0.39; bag 0.31; cloth 0.14
random mean success 0.43333333333333335; foliage 0.41; bag 0.37; cloth 0.52
candidate0 mean success 0.2; foliage 0.24; bag 0.22; cloth 0.14
oracle mean success 0.4066666666666667; foliage 0.5; bag 0.42; cloth 0.3
scripted mean success 1.0; foliage 1.0; bag 1.0; cloth 1.0

Proxy Sprint v7 Ablation Table

Source:

  • VLAarchtests/artifacts/reports/sprint_v7_summary/reveal_sprint_summary_compact.json
Item Raw values
no_planner 0.2
no_memory 0.3233333333333333
no_task_conditioning 0.28
no_geometry 0.27
no_camera_pose 0.29333333333333333

Selector Table

Sources:

  • VLAarchtests/artifacts/reports/selector_finetune_v7_iter6/default/reveal_benchmark.json
  • VLAarchtests/artifacts/reports/selector_finetune_v7_iter7/full_fixed_default/reveal_benchmark.json
  • VLAarchtests/artifacts/reports/selector_finetune_v7_iter8/bag_fixed_default/reveal_benchmark.json
  • VLAarchtests/artifacts/reports/task_routed_proxy_v1/summary.md
Item Raw values
iter6 mean success 0.4566666666666667; foliage 0.46; bag 0.4; cloth 0.51
iter7 mean success 0.4666666666666666; foliage 0.4; bag 0.41; cloth 0.59
iter8 bag fixed slice mean success 0.41; nominal 0.45; high_reocclusion 0.4; camera_perturbation 0.5; one_sided_slip 0.25
routed controller mean success 0.48666666666666664; route foliage -> iter6, bag -> iter8, cloth -> iter8; foliage 0.46; bag 0.41; cloth 0.59

Proxy Baseline Compare Table

Source:

  • VLAarchtests/artifacts/reports/real_baseline_compare_v7_full/reveal_benchmark.json
Item Raw values
baseline_rgbd_stage3 mean success 0.31; foliage 0.21; bag 0.15; cloth 0.57
iter5_selector mean success 0.45; foliage 0.44; bag 0.4; cloth 0.51

RLBench Recovered Push-Box Comparator

Sources:

  • reports/rlbench_general_debug/rlbench_push_box_fair_step1_final_knn_ep10_x99_res224_len180_train80_fixed/bimanual_push_box/rollout_eval.json
  • reports/rlbench_general_debug/rlbench_push_box_historical_step1_knn_ep10_x99_res224_len180_train80_fixed/bimanual_push_box/rollout_eval.json
Item Raw values
current fair-step1 final mean success 0.7; mean return 0.7; successes [1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0]
historical push-box control mean success 0.4; mean return 0.4; successes [0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0]

Official AnyBimanual Overlap Training Milestones

Sources:

  • baselines/AnyBimanual_overlap_runs/peract_bc_subset3_overlap_smoke200_fixpretrain_nowandb3/PERACT_BC/seed0/training.log
  • VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/status.md
Global step Raw values
300 loss 40.91718; sample time 0.093029; step time 14.0686
400 loss 33.26684; sample time 0.073085; step time 14.3032
500 loss 36.07054; sample time 0.048558; step time 11.1376
600 loss 35.32345; sample time 0.040642; step time 9.7719
700 loss 28.50959; sample time 0.057937; step time 10.9347
800 loss 23.60169; sample time 0.032697; step time 11.8652
900 loss 15.28901; sample time 0.051232; step time 11.5073
1000 checkpoint train reached weights/1000 and exited cleanly

Official AnyBimanual Overlap Eval Final Output

Sources:

  • reports/anybimanual_subset3_overlap_resume1000_eval.log
  • VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/resume1000_summary/summary.json
Item Raw values
local last complete step 1000
local mean success 0.16
coordinated_push_box success 0.0; return 0.0; final score log line 0.0
coordinated_lift_ball success 0.0; return 0.0; final score log line 0.0
dual_push_buttons success 0.48; return 12.0; final score log line 12.0
public best overlap step in local summary step 60000; mean success 0.6933333333333334
public best overlap per-task success coordinated_push_box 0.8; coordinated_lift_ball 0.32; dual_push_buttons 0.96
delta vs public best mean success -0.5333333333333333
delta vs public best per-task success coordinated_push_box -0.8; coordinated_lift_ball -0.32; dual_push_buttons -0.48

Validated General-Task Anchor: dual_push_buttons

Source:

  • VLAarchtests/artifacts/reports/general_task_anchor_20260330_dual_push_buttons/summary.json
Item Raw values
public AnyBimanual release step 60000; success 0.96; return 24.0; length 21.56
local official single-task eval step 60000; episodes 25; success 0.96; return 24.0; length 21.84
local clip backbone-only success 0.0; return 0.0; path reports/true_baseline_compare_subset3_v1/rlbench_subset3_backbone_only_clip_100demo_fair_seed17_noplan_split/bimanual_dual_push_buttons/rollout_eval.json
local elastic reveal proxy iter6 success 0.0; return 0.0; path reports/true_baseline_compare_subset3_v1/rlbench_subset3_elastic_reveal_proxy_iter6_100demo_fair_seed17_noplan_split/bimanual_dual_push_buttons/rollout_eval.json
local RVT hybrid frozen fixed-bounds success 0.0; return 0.0; path reports/rvt_overlap_branch_fixedbounds_20260330/evals/rlbench_subset3_backbone_only_rvt_100demo_frozen_fixedbounds_seed17_noplan_split/bimanual_dual_push_buttons/rollout_eval.json

RVT Overlap Branch

Sources:

  • VLAarchtests/artifacts/reports/rvt_overlap_branch_20260330/status.md
  • VLAarchtests/artifacts/reports/rvt_overlap_branch_20260330/summary.md
  • VLAarchtests/artifacts/reports/rvt_overlap_branch_fixedbounds_20260330/summary.md
Item Raw values
frozen RVT stage1 train checkpoint outputs/rlbench_rvt_branch/rlbench_subset3_backbone_only_rvt_100demo_frozen_seed17/checkpoint_best.pt; final train total 0.043179353826920445; final val total 0.039591669984665984; train seconds 2261.2839448451996
frozen RVT overlap eval mean success 0.0; push_box 0.0; lift_ball 0.0; dual_push_buttons 0.0
frozen fixed-bounds RVT overlap eval mean success 0.0; push_box 0.0; lift_ball 0.0; dual_push_buttons 0.0
local overlap floor used for gate 0.16
stage2 run flag false

Dual-Push Nonzero Branch

Source:

  • VLAarchtests/artifacts/reports/dual_push_nonzero_branch_20260330/summary.md
Item Raw values
direct rollout smoke planning 5 episodes; 25 steps; mean success 0.0; path reports/dual_push_nonzero_branch_20260330/smoke_planning/rollout_eval.json
controller sweep planning_c4 0.0
controller sweep ik_c1 0.0
controller sweep planning_c1_s05 0.0
kNN top-1 planning 5 episodes; 25 steps; mean success 0.0
weighted rollout smoke planning 5 episodes; 25 steps; mean success 0.0
demo replay through absolute_action_from_delta mean success 0.8; mean return 0.8; successful demo step counts 89, 112, 93, 112
weighted kNN top-1 planning length120 2 episodes; mean success 0.0
chunk8 probe IK length120 1 episode; success 0.0; return 0.0; path recoveries 119; noop fallbacks 1
retargeted demo task_state smoke 2 episodes; mean success 1.0; mean return 1.0
retargeted demo checkpoint-backbone ep5 5 episodes; mean success 1.0; mean return 1.0
retargeted demo checkpoint-backbone vision ep1 1 episode; mean success 1.0; mean return 1.0
retargeted demo checkpoint-backbone vision ep5 5 episodes; mean success 1.0; mean return 1.0

Dual-Push Full-Architecture Hybrid

Sources:

  • VLAarchtests/artifacts/reports/dual_push_full_arch_hybrid_20260331/summary.md
  • reports/dual_push_full_arch_probe_iter6_scene_ep1/summary.json
  • reports/dual_push_full_arch_hybrid_iter6_backbone_ep1/summary.json
Item Raw values
elastic checkpoint retargeted-demo probe 1 episode; mean success 1.0; mean return 1.0; steps 94; retrieved episode index 11; retrieval similarity 0.9998629689216614
full-architecture hybrid eval 1 episode; mean success 1.0; mean return 1.0; steps 116; path recoveries 0; noop fallbacks 0; first selected mode residual::maintain_opening; last selected mode residual::base_action

Previous Repo Raw Results

Previous raw tables are preserved in:

  • history/VLAarchtests_previous_README.md