Results Raw
This file records exact values and exact partial statuses without additional conclusions.
Proxy Sprint v7 Main Table
Source:
VLAarchtests/artifacts/reports/sprint_v7_summary/reveal_sprint_summary_compact.json
| Item | Raw values |
|---|---|
| base_model | mean success 0.28; foliage 0.39; bag 0.31; cloth 0.14 |
| random | mean success 0.43333333333333335; foliage 0.41; bag 0.37; cloth 0.52 |
| candidate0 | mean success 0.2; foliage 0.24; bag 0.22; cloth 0.14 |
| oracle | mean success 0.4066666666666667; foliage 0.5; bag 0.42; cloth 0.3 |
| scripted | mean success 1.0; foliage 1.0; bag 1.0; cloth 1.0 |
Proxy Sprint v7 Ablation Table
Source:
VLAarchtests/artifacts/reports/sprint_v7_summary/reveal_sprint_summary_compact.json
| Item | Raw values |
|---|---|
| no_planner | 0.2 |
| no_memory | 0.3233333333333333 |
| no_task_conditioning | 0.28 |
| no_geometry | 0.27 |
| no_camera_pose | 0.29333333333333333 |
Selector Table
Sources:
VLAarchtests/artifacts/reports/selector_finetune_v7_iter6/default/reveal_benchmark.jsonVLAarchtests/artifacts/reports/selector_finetune_v7_iter7/full_fixed_default/reveal_benchmark.jsonVLAarchtests/artifacts/reports/selector_finetune_v7_iter8/bag_fixed_default/reveal_benchmark.jsonVLAarchtests/artifacts/reports/task_routed_proxy_v1/summary.md
| Item | Raw values |
|---|---|
| iter6 | mean success 0.4566666666666667; foliage 0.46; bag 0.4; cloth 0.51 |
| iter7 | mean success 0.4666666666666666; foliage 0.4; bag 0.41; cloth 0.59 |
| iter8 bag fixed slice | mean success 0.41; nominal 0.45; high_reocclusion 0.4; camera_perturbation 0.5; one_sided_slip 0.25 |
| routed controller | mean success 0.48666666666666664; route foliage -> iter6, bag -> iter8, cloth -> iter8; foliage 0.46; bag 0.41; cloth 0.59 |
Proxy Baseline Compare Table
Source:
VLAarchtests/artifacts/reports/real_baseline_compare_v7_full/reveal_benchmark.json
| Item | Raw values |
|---|---|
| baseline_rgbd_stage3 | mean success 0.31; foliage 0.21; bag 0.15; cloth 0.57 |
| iter5_selector | mean success 0.45; foliage 0.44; bag 0.4; cloth 0.51 |
RLBench Recovered Push-Box Comparator
Sources:
reports/rlbench_general_debug/rlbench_push_box_fair_step1_final_knn_ep10_x99_res224_len180_train80_fixed/bimanual_push_box/rollout_eval.jsonreports/rlbench_general_debug/rlbench_push_box_historical_step1_knn_ep10_x99_res224_len180_train80_fixed/bimanual_push_box/rollout_eval.json
| Item | Raw values |
|---|---|
| current fair-step1 final | mean success 0.7; mean return 0.7; successes [1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0] |
| historical push-box control | mean success 0.4; mean return 0.4; successes [0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0] |
Official AnyBimanual Overlap Training Milestones
Sources:
baselines/AnyBimanual_overlap_runs/peract_bc_subset3_overlap_smoke200_fixpretrain_nowandb3/PERACT_BC/seed0/training.logVLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/status.md
| Global step | Raw values |
|---|---|
| 300 | loss 40.91718; sample time 0.093029; step time 14.0686 |
| 400 | loss 33.26684; sample time 0.073085; step time 14.3032 |
| 500 | loss 36.07054; sample time 0.048558; step time 11.1376 |
| 600 | loss 35.32345; sample time 0.040642; step time 9.7719 |
| 700 | loss 28.50959; sample time 0.057937; step time 10.9347 |
| 800 | loss 23.60169; sample time 0.032697; step time 11.8652 |
| 900 | loss 15.28901; sample time 0.051232; step time 11.5073 |
| 1000 checkpoint | train reached weights/1000 and exited cleanly |
Official AnyBimanual Overlap Eval Final Output
Sources:
reports/anybimanual_subset3_overlap_resume1000_eval.logVLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/resume1000_summary/summary.json
| Item | Raw values |
|---|---|
| local last complete step | 1000 |
| local mean success | 0.16 |
| coordinated_push_box | success 0.0; return 0.0; final score log line 0.0 |
| coordinated_lift_ball | success 0.0; return 0.0; final score log line 0.0 |
| dual_push_buttons | success 0.48; return 12.0; final score log line 12.0 |
| public best overlap step in local summary | step 60000; mean success 0.6933333333333334 |
| public best overlap per-task success | coordinated_push_box 0.8; coordinated_lift_ball 0.32; dual_push_buttons 0.96 |
| delta vs public best mean success | -0.5333333333333333 |
| delta vs public best per-task success | coordinated_push_box -0.8; coordinated_lift_ball -0.32; dual_push_buttons -0.48 |
Validated General-Task Anchor: dual_push_buttons
Source:
VLAarchtests/artifacts/reports/general_task_anchor_20260330_dual_push_buttons/summary.json
| Item | Raw values |
|---|---|
| public AnyBimanual release | step 60000; success 0.96; return 24.0; length 21.56 |
| local official single-task eval | step 60000; episodes 25; success 0.96; return 24.0; length 21.84 |
| local clip backbone-only | success 0.0; return 0.0; path reports/true_baseline_compare_subset3_v1/rlbench_subset3_backbone_only_clip_100demo_fair_seed17_noplan_split/bimanual_dual_push_buttons/rollout_eval.json |
| local elastic reveal proxy iter6 | success 0.0; return 0.0; path reports/true_baseline_compare_subset3_v1/rlbench_subset3_elastic_reveal_proxy_iter6_100demo_fair_seed17_noplan_split/bimanual_dual_push_buttons/rollout_eval.json |
| local RVT hybrid frozen fixed-bounds | success 0.0; return 0.0; path reports/rvt_overlap_branch_fixedbounds_20260330/evals/rlbench_subset3_backbone_only_rvt_100demo_frozen_fixedbounds_seed17_noplan_split/bimanual_dual_push_buttons/rollout_eval.json |
RVT Overlap Branch
Sources:
VLAarchtests/artifacts/reports/rvt_overlap_branch_20260330/status.mdVLAarchtests/artifacts/reports/rvt_overlap_branch_20260330/summary.mdVLAarchtests/artifacts/reports/rvt_overlap_branch_fixedbounds_20260330/summary.md
| Item | Raw values |
|---|---|
| frozen RVT stage1 train | checkpoint outputs/rlbench_rvt_branch/rlbench_subset3_backbone_only_rvt_100demo_frozen_seed17/checkpoint_best.pt; final train total 0.043179353826920445; final val total 0.039591669984665984; train seconds 2261.2839448451996 |
| frozen RVT overlap eval | mean success 0.0; push_box 0.0; lift_ball 0.0; dual_push_buttons 0.0 |
| frozen fixed-bounds RVT overlap eval | mean success 0.0; push_box 0.0; lift_ball 0.0; dual_push_buttons 0.0 |
| local overlap floor used for gate | 0.16 |
| stage2 run flag | false |
Dual-Push Nonzero Branch
Source:
VLAarchtests/artifacts/reports/dual_push_nonzero_branch_20260330/summary.md
| Item | Raw values |
|---|---|
| direct rollout smoke planning | 5 episodes; 25 steps; mean success 0.0; path reports/dual_push_nonzero_branch_20260330/smoke_planning/rollout_eval.json |
| controller sweep planning_c4 | 0.0 |
| controller sweep ik_c1 | 0.0 |
| controller sweep planning_c1_s05 | 0.0 |
| kNN top-1 planning | 5 episodes; 25 steps; mean success 0.0 |
| weighted rollout smoke planning | 5 episodes; 25 steps; mean success 0.0 |
| demo replay through absolute_action_from_delta | mean success 0.8; mean return 0.8; successful demo step counts 89, 112, 93, 112 |
| weighted kNN top-1 planning length120 | 2 episodes; mean success 0.0 |
| chunk8 probe IK length120 | 1 episode; success 0.0; return 0.0; path recoveries 119; noop fallbacks 1 |
| retargeted demo task_state smoke | 2 episodes; mean success 1.0; mean return 1.0 |
| retargeted demo checkpoint-backbone ep5 | 5 episodes; mean success 1.0; mean return 1.0 |
| retargeted demo checkpoint-backbone vision ep1 | 1 episode; mean success 1.0; mean return 1.0 |
| retargeted demo checkpoint-backbone vision ep5 | 5 episodes; mean success 1.0; mean return 1.0 |
Dual-Push Full-Architecture Hybrid
Sources:
VLAarchtests/artifacts/reports/dual_push_full_arch_hybrid_20260331/summary.mdreports/dual_push_full_arch_probe_iter6_scene_ep1/summary.jsonreports/dual_push_full_arch_hybrid_iter6_backbone_ep1/summary.json
| Item | Raw values |
|---|---|
| elastic checkpoint retargeted-demo probe | 1 episode; mean success 1.0; mean return 1.0; steps 94; retrieved episode index 11; retrieval similarity 0.9998629689216614 |
| full-architecture hybrid eval | 1 episode; mean success 1.0; mean return 1.0; steps 116; path recoveries 0; noop fallbacks 0; first selected mode residual::maintain_opening; last selected mode residual::base_action |
Previous Repo Raw Results
Previous raw tables are preserved in:
history/VLAarchtests_previous_README.md