# Results Raw This file records exact values and exact partial statuses without additional conclusions. ## Proxy Sprint v7 Main Table Source: - `VLAarchtests/artifacts/reports/sprint_v7_summary/reveal_sprint_summary_compact.json` | Item | Raw values | | --- | --- | | base_model | mean success `0.28`; foliage `0.39`; bag `0.31`; cloth `0.14` | | random | mean success `0.43333333333333335`; foliage `0.41`; bag `0.37`; cloth `0.52` | | candidate0 | mean success `0.2`; foliage `0.24`; bag `0.22`; cloth `0.14` | | oracle | mean success `0.4066666666666667`; foliage `0.5`; bag `0.42`; cloth `0.3` | | scripted | mean success `1.0`; foliage `1.0`; bag `1.0`; cloth `1.0` | ## Proxy Sprint v7 Ablation Table Source: - `VLAarchtests/artifacts/reports/sprint_v7_summary/reveal_sprint_summary_compact.json` | Item | Raw values | | --- | --- | | no_planner | `0.2` | | no_memory | `0.3233333333333333` | | no_task_conditioning | `0.28` | | no_geometry | `0.27` | | no_camera_pose | `0.29333333333333333` | ## Selector Table Sources: - `VLAarchtests/artifacts/reports/selector_finetune_v7_iter6/default/reveal_benchmark.json` - `VLAarchtests/artifacts/reports/selector_finetune_v7_iter7/full_fixed_default/reveal_benchmark.json` - `VLAarchtests/artifacts/reports/selector_finetune_v7_iter8/bag_fixed_default/reveal_benchmark.json` - `VLAarchtests/artifacts/reports/task_routed_proxy_v1/summary.md` | Item | Raw values | | --- | --- | | iter6 | mean success `0.4566666666666667`; foliage `0.46`; bag `0.4`; cloth `0.51` | | iter7 | mean success `0.4666666666666666`; foliage `0.4`; bag `0.41`; cloth `0.59` | | iter8 bag fixed slice | mean success `0.41`; nominal `0.45`; high_reocclusion `0.4`; camera_perturbation `0.5`; one_sided_slip `0.25` | | routed controller | mean success `0.48666666666666664`; route `foliage -> iter6`, `bag -> iter8`, `cloth -> iter8`; foliage `0.46`; bag `0.41`; cloth `0.59` | ## Proxy Baseline Compare Table Source: - `VLAarchtests/artifacts/reports/real_baseline_compare_v7_full/reveal_benchmark.json` | Item | Raw values | | --- | --- | | baseline_rgbd_stage3 | mean success `0.31`; foliage `0.21`; bag `0.15`; cloth `0.57` | | iter5_selector | mean success `0.45`; foliage `0.44`; bag `0.4`; cloth `0.51` | ## RLBench Recovered Push-Box Comparator Sources: - `reports/rlbench_general_debug/rlbench_push_box_fair_step1_final_knn_ep10_x99_res224_len180_train80_fixed/bimanual_push_box/rollout_eval.json` - `reports/rlbench_general_debug/rlbench_push_box_historical_step1_knn_ep10_x99_res224_len180_train80_fixed/bimanual_push_box/rollout_eval.json` | Item | Raw values | | --- | --- | | current fair-step1 final | mean success `0.7`; mean return `0.7`; successes `[1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0]` | | historical push-box control | mean success `0.4`; mean return `0.4`; successes `[0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0]` | ## Official AnyBimanual Overlap Training Milestones Sources: - `baselines/AnyBimanual_overlap_runs/peract_bc_subset3_overlap_smoke200_fixpretrain_nowandb3/PERACT_BC/seed0/training.log` - `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/status.md` | Global step | Raw values | | --- | --- | | 300 | loss `40.91718`; sample time `0.093029`; step time `14.0686` | | 400 | loss `33.26684`; sample time `0.073085`; step time `14.3032` | | 500 | loss `36.07054`; sample time `0.048558`; step time `11.1376` | | 600 | loss `35.32345`; sample time `0.040642`; step time `9.7719` | | 700 | loss `28.50959`; sample time `0.057937`; step time `10.9347` | | 800 | loss `23.60169`; sample time `0.032697`; step time `11.8652` | | 900 | loss `15.28901`; sample time `0.051232`; step time `11.5073` | | 1000 checkpoint | train reached `weights/1000` and exited cleanly | ## Official AnyBimanual Overlap Eval Final Output Sources: - `reports/anybimanual_subset3_overlap_resume1000_eval.log` - `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/resume1000_summary/summary.json` | Item | Raw values | | --- | --- | | local last complete step | `1000` | | local mean success | `0.16` | | coordinated_push_box | success `0.0`; return `0.0`; final score log line `0.0` | | coordinated_lift_ball | success `0.0`; return `0.0`; final score log line `0.0` | | dual_push_buttons | success `0.48`; return `12.0`; final score log line `12.0` | | public best overlap step in local summary | step `60000`; mean success `0.6933333333333334` | | public best overlap per-task success | coordinated_push_box `0.8`; coordinated_lift_ball `0.32`; dual_push_buttons `0.96` | | delta vs public best mean success | `-0.5333333333333333` | | delta vs public best per-task success | coordinated_push_box `-0.8`; coordinated_lift_ball `-0.32`; dual_push_buttons `-0.48` | ## Validated General-Task Anchor: dual_push_buttons Source: - `VLAarchtests/artifacts/reports/general_task_anchor_20260330_dual_push_buttons/summary.json` | Item | Raw values | | --- | --- | | public AnyBimanual release | step `60000`; success `0.96`; return `24.0`; length `21.56` | | local official single-task eval | step `60000`; episodes `25`; success `0.96`; return `24.0`; length `21.84` | | local clip backbone-only | success `0.0`; return `0.0`; path `reports/true_baseline_compare_subset3_v1/rlbench_subset3_backbone_only_clip_100demo_fair_seed17_noplan_split/bimanual_dual_push_buttons/rollout_eval.json` | | local elastic reveal proxy iter6 | success `0.0`; return `0.0`; path `reports/true_baseline_compare_subset3_v1/rlbench_subset3_elastic_reveal_proxy_iter6_100demo_fair_seed17_noplan_split/bimanual_dual_push_buttons/rollout_eval.json` | | local RVT hybrid frozen fixed-bounds | success `0.0`; return `0.0`; path `reports/rvt_overlap_branch_fixedbounds_20260330/evals/rlbench_subset3_backbone_only_rvt_100demo_frozen_fixedbounds_seed17_noplan_split/bimanual_dual_push_buttons/rollout_eval.json` | ## RVT Overlap Branch Sources: - `VLAarchtests/artifacts/reports/rvt_overlap_branch_20260330/status.md` - `VLAarchtests/artifacts/reports/rvt_overlap_branch_20260330/summary.md` - `VLAarchtests/artifacts/reports/rvt_overlap_branch_fixedbounds_20260330/summary.md` | Item | Raw values | | --- | --- | | frozen RVT stage1 train | checkpoint `outputs/rlbench_rvt_branch/rlbench_subset3_backbone_only_rvt_100demo_frozen_seed17/checkpoint_best.pt`; final train total `0.043179353826920445`; final val total `0.039591669984665984`; train seconds `2261.2839448451996` | | frozen RVT overlap eval | mean success `0.0`; push_box `0.0`; lift_ball `0.0`; dual_push_buttons `0.0` | | frozen fixed-bounds RVT overlap eval | mean success `0.0`; push_box `0.0`; lift_ball `0.0`; dual_push_buttons `0.0` | | local overlap floor used for gate | `0.16` | | stage2 run flag | `false` | ## Dual-Push Nonzero Branch Source: - `VLAarchtests/artifacts/reports/dual_push_nonzero_branch_20260330/summary.md` | Item | Raw values | | --- | --- | | direct rollout smoke planning | `5` episodes; `25` steps; mean success `0.0`; path `reports/dual_push_nonzero_branch_20260330/smoke_planning/rollout_eval.json` | | controller sweep planning_c4 | `0.0` | | controller sweep ik_c1 | `0.0` | | controller sweep planning_c1_s05 | `0.0` | | kNN top-1 planning | `5` episodes; `25` steps; mean success `0.0` | | weighted rollout smoke planning | `5` episodes; `25` steps; mean success `0.0` | | demo replay through absolute_action_from_delta | mean success `0.8`; mean return `0.8`; successful demo step counts `89`, `112`, `93`, `112` | | weighted kNN top-1 planning length120 | `2` episodes; mean success `0.0` | | chunk8 probe IK length120 | `1` episode; success `0.0`; return `0.0`; path recoveries `119`; noop fallbacks `1` | | retargeted demo task_state smoke | `2` episodes; mean success `1.0`; mean return `1.0` | | retargeted demo checkpoint-backbone ep5 | `5` episodes; mean success `1.0`; mean return `1.0` | | retargeted demo checkpoint-backbone vision ep1 | `1` episode; mean success `1.0`; mean return `1.0` | | retargeted demo checkpoint-backbone vision ep5 | `5` episodes; mean success `1.0`; mean return `1.0` | ## Dual-Push Full-Architecture Hybrid Sources: - `VLAarchtests/artifacts/reports/dual_push_full_arch_hybrid_20260331/summary.md` - `reports/dual_push_full_arch_probe_iter6_scene_ep1/summary.json` - `reports/dual_push_full_arch_hybrid_iter6_backbone_ep1/summary.json` | Item | Raw values | | --- | --- | | elastic checkpoint retargeted-demo probe | `1` episode; mean success `1.0`; mean return `1.0`; steps `94`; retrieved episode index `11`; retrieval similarity `0.9998629689216614` | | full-architecture hybrid eval | `1` episode; mean success `1.0`; mean return `1.0`; steps `116`; path recoveries `0`; noop fallbacks `0`; first selected mode `residual::maintain_opening`; last selected mode `residual::base_action` | ## Previous Repo Raw Results Previous raw tables are preserved in: - `history/VLAarchtests_previous_README.md`