| # VLAarchtests2 |
|
|
| Bundle staged from `/workspace` on `2026-03-31 UTC`. |
|
|
| This repo is the follow-on organization repo to `lsnu/VLAarchtests`. It includes: |
|
|
| - current code under `VLAarchtests/` |
| - current third-party baseline code under `third_party/` |
| - current baseline runs, replay artifacts, demo roots, and released checkpoint material under `baselines/` |
| - current training outputs and checkpoints under `outputs/` |
| - current logs under `reports/` |
| - environment recreation files under `environment/` |
| - raw results and change/test logs at the repo root |
| - the previous repo README under `history/VLAarchtests_previous_README.md` |
| - the active handoff file under `handoff/instructions4.md` |
|
|
| ## Top-Level Contents |
|
|
| - `VLAarchtests/` |
| - code, tests, configs, generated configs, reports, checkpoints, and proxy datasets from the current runpod workspace |
| - `third_party/AnyBimanual/` |
| - local AnyBimanual checkout used for the official overlap baseline branch, including local compatibility patches |
| - `baselines/` |
| - released AnyBimanual checkpoint material |
| - overlap replay artifacts |
| - HF export packaging note: `baselines/AnyBimanual_overlap_replay/multi/` is sharded into subdirectories to satisfy the Hub `10000 files per directory` limit |
| - overlap run directories |
| - local subset3 demo roots used by the overlap branch |
| - `outputs/` |
| - RLBench training outputs and checkpoints used by the current anchor, RVT, dual-push, and elastic-controller branches |
| - `reports/` |
| - training and evaluation logs copied from `/workspace/reports` |
| - `environment/` |
| - machine snapshot, package lists, and setup helpers |
| - `history/` |
| - copied previous-repo README |
| - `handoff/` |
| - active sprint instruction file |
| - `RESULTS_RAW.md` |
| - raw result tables and final official overlap eval outputs |
| - `CHANGE_AND_TEST_LOG.md` |
| - file-level change log and executed test commands |
| - `MODEL_AND_ARTIFACT_INDEX.md` |
| - staged directory map with main artifact roots |
|
|
| ## Previous Repo Coverage |
|
|
| The earlier `lsnu/VLAarchtests` repo covered the `2026-03-25/26` work. Its README is copied verbatim at: |
|
|
| - `history/VLAarchtests_previous_README.md` |
|
|
| Previous-repo items explicitly referenced there include: |
|
|
| - compact, spatial, compact-phase, and spatial-phase proxy branches |
| - earlier RLBench direct-policy and kNN runs |
| - environment recreation files |
| - prior raw result tables |
|
|
| ## Current Session Additions |
|
|
| Current-session folders added or expanded in this repo include: |
|
|
| - `VLAarchtests/artifacts/reports/sprint_v7_summary/` |
| - `VLAarchtests/artifacts/reports/sprint_v7_followup/` |
| - `VLAarchtests/artifacts/reports/selector_finetune_v7_iterations/` |
| - `VLAarchtests/artifacts/reports/selector_finetune_v7_iter6/` |
| - `VLAarchtests/artifacts/reports/selector_finetune_v7_iter7/` |
| - `VLAarchtests/artifacts/reports/selector_finetune_v7_iter8/` |
| - `VLAarchtests/artifacts/reports/task_routed_proxy_v1/` |
| - `VLAarchtests/artifacts/reports/rlbench_general_debug_20260330/` |
| - `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/` |
| - `VLAarchtests/artifacts/reports/bag_mode_specialization_20260330/` |
| - `VLAarchtests/artifacts/reports/general_task_anchor_20260330_dual_push_buttons/` |
| - `VLAarchtests/artifacts/reports/rvt_overlap_branch_20260330/` |
| - `VLAarchtests/artifacts/reports/rvt_overlap_branch_fixedbounds_20260330/` |
| - `VLAarchtests/artifacts/reports/dual_push_nonzero_branch_20260330/` |
| - `VLAarchtests/artifacts/reports/dual_push_full_arch_hybrid_20260331/` |
|
|
| ## Raw Results Snapshot |
|
|
| ### Proxy sprint v7 |
|
|
| Source: |
|
|
| - `VLAarchtests/artifacts/reports/sprint_v7_summary/reveal_sprint_summary_compact.json` |
|
|
| Raw values: |
|
|
| - base model mean success: `0.28` |
| - base per-task: foliage `0.39`, bag `0.31`, cloth `0.14` |
| - random mean success: `0.43333333333333335` |
| - candidate0 mean success: `0.2` |
| - oracle mean success: `0.4066666666666667` |
| - scripted mean success: `1.0` |
|
|
| ### Eval-time ablations |
|
|
| Source: |
|
|
| - `VLAarchtests/artifacts/reports/sprint_v7_summary/reveal_sprint_summary_compact.json` |
|
|
| Raw values: |
|
|
| - `no_planner`: `0.2` |
| - `no_memory`: `0.3233333333333333` |
| - `no_task_conditioning`: `0.28` |
| - `no_geometry`: `0.27` |
| - `no_camera_pose`: `0.29333333333333333` |
|
|
| ### Selector checkpoints |
|
|
| Sources: |
|
|
| - `VLAarchtests/artifacts/reports/selector_finetune_v7_iter6/default/reveal_benchmark.json` |
| - `VLAarchtests/artifacts/reports/selector_finetune_v7_iter7/full_fixed_default/reveal_benchmark.json` |
| - `VLAarchtests/artifacts/reports/selector_finetune_v7_iter8/bag_fixed_default/reveal_benchmark.json` |
| - `VLAarchtests/artifacts/reports/task_routed_proxy_v1/summary.md` |
|
|
| Raw values: |
|
|
| - `iter6` mean success: `0.4566666666666667` |
| - foliage `0.46`, bag `0.4`, cloth `0.51` |
| - `iter7` mean success: `0.4666666666666666` |
| - foliage `0.4`, bag `0.41`, cloth `0.59` |
| - `iter8` bag-only fixed slice: `0.41` |
| - routed controller mean success: `0.48666666666666664` |
| - routing rule: `foliage -> iter6`, `bag -> iter8`, `cloth -> iter8` |
| - per-task: foliage `0.46`, bag `0.41`, cloth `0.59` |
|
|
| ### Real baseline compare on proxy suite |
|
|
| Source: |
|
|
| - `VLAarchtests/artifacts/reports/real_baseline_compare_v7_full/reveal_benchmark.json` |
|
|
| Raw values: |
|
|
| - `baseline_rgbd_stage3` mean success: `0.31` |
| - foliage `0.21`, bag `0.15`, cloth `0.57` |
| - `iter5_selector` mean success: `0.45` |
| - foliage `0.44`, bag `0.4`, cloth `0.51` |
|
|
| ### RLBench recovered push-box comparator |
|
|
| Sources: |
|
|
| - `reports/rlbench_general_debug/rlbench_push_box_fair_step1_final_knn_ep10_x99_res224_len180_train80_fixed/bimanual_push_box/rollout_eval.json` |
| - `reports/rlbench_general_debug/rlbench_push_box_historical_step1_knn_ep10_x99_res224_len180_train80_fixed/bimanual_push_box/rollout_eval.json` |
|
|
| Raw values: |
|
|
| - current fair-step1 final mean success: `0.7` |
| - current fair-step1 final successes: |
| - `[1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0]` |
| - historical push-box control mean success: `0.4` |
| - historical push-box control successes: |
| - `[0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0]` |
|
|
| ### Official AnyBimanual overlap branch |
|
|
| Sources: |
|
|
| - `baselines/AnyBimanual_overlap_runs/peract_bc_subset3_overlap_smoke200_fixpretrain_nowandb3/PERACT_BC/seed0/training.log` |
| - `reports/anybimanual_subset3_overlap_resume1000_eval.log` |
|
|
| Raw train milestones: |
|
|
| - global step `300`: loss `40.91718` |
| - global step `400`: loss `33.26684` |
| - global step `500`: loss `36.07054` |
| - global step `600`: loss `35.32345` |
| - global step `700`: loss `28.50959` |
| - global step `800`: loss `23.60169` |
| - global step `900`: loss `15.28901` |
| - run reached `weights/1000` and the train exited cleanly |
|
|
| Raw eval outputs: |
|
|
| - source log: `reports/anybimanual_subset3_overlap_resume1000_eval.log` |
| - summary files: |
| - `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/resume1000_summary/summary.md` |
| - `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/resume1000_summary/summary.json` |
| - local last complete step: `1000` |
| - local mean success: `0.16` |
| - local per-task success: |
| - `coordinated_push_box`: `0.0` |
| - `coordinated_lift_ball`: `0.0` |
| - `dual_push_buttons`: `0.48` |
| - local per-task return: |
| - `coordinated_push_box`: `0.0` |
| - `coordinated_lift_ball`: `0.0` |
| - `dual_push_buttons`: `12.0` |
| - public best overlap step in the local summary: `60000` |
| - public best mean success in the local summary: `0.6933333333333334` |
|
|
| ### Validated general-task anchor: `dual_push_buttons` |
|
|
| Sources: |
|
|
| - `VLAarchtests/artifacts/reports/general_task_anchor_20260330_dual_push_buttons/summary.json` |
| - `baselines/AnyBimanual_release_eval_anchor/perlf_release_dual_push_buttons_ep25/PERACT_BC/seed0/eval_data.csv` |
|
|
| Raw values: |
|
|
| - public AnyBimanual release, step `60000`: success `0.96`, return `24.0`, length `21.56` |
| - local official single-task eval, step `60000`, `25` episodes: success `0.96`, return `24.0`, length `21.84` |
| - local clip backbone-only result on same task: success `0.0`, return `0.0` |
| - local elastic reveal proxy iter6 result on same task: success `0.0`, return `0.0` |
| - local RVT frozen fixed-bounds result on same task: success `0.0`, return `0.0` |
|
|
| ### RVT overlap branch |
|
|
| Sources: |
|
|
| - `VLAarchtests/artifacts/reports/rvt_overlap_branch_20260330/summary.md` |
| - `VLAarchtests/artifacts/reports/rvt_overlap_branch_fixedbounds_20260330/summary.md` |
|
|
| Raw values: |
|
|
| - frozen RVT stage1 train summary: |
| - `outputs/rlbench_rvt_branch/rlbench_subset3_backbone_only_rvt_100demo_frozen_seed17/summary.json` |
| - final train total `0.043179353826920445` |
| - final val total `0.039591669984665984` |
| - frozen RVT overlap eval: mean success `0.0` |
| - frozen fixed-bounds RVT overlap eval: mean success `0.0` |
| - both branch gates: |
| - local AnyBimanual overlap floor `0.16` |
| - stage2 run `false` |
|
|
| ### Dual-push non-privileged retarget branch |
|
|
| Sources: |
|
|
| - `VLAarchtests/artifacts/reports/dual_push_nonzero_branch_20260330/summary.md` |
|
|
| Raw values: |
|
|
| - demo replay through `absolute_action_from_delta`: |
| - `reports/dual_push_nonzero_branch_20260330/demo_replay/replay_summary.json` |
| - mean success `0.8` |
| - mean return `0.8` |
| - retargeted demo with checkpoint backbone retrieval and vision-only button localization: |
| - `reports/dual_push_nonzero_branch_20260330/retargeted_demo_backbone_vision_ep1/summary.json` |
| - mean success `1.0` |
| - mean return `1.0` |
| - retargeted demo with checkpoint backbone retrieval and vision-only button localization: |
| - `reports/dual_push_nonzero_branch_20260330/retargeted_demo_backbone_vision_ep5/summary.json` |
| - mean success `1.0` |
| - mean return `1.0` |
|
|
| ### Dual-push full-architecture hybrid branch |
|
|
| Sources: |
|
|
| - `VLAarchtests/artifacts/reports/dual_push_full_arch_hybrid_20260331/summary.md` |
| - `reports/dual_push_full_arch_probe_iter6_scene_ep1/summary.json` |
| - `reports/dual_push_full_arch_hybrid_iter6_backbone_ep1/summary.json` |
|
|
| Raw values: |
|
|
| - elastic checkpoint retargeted-demo probe with scene retrieval and vision-only button localization: |
| - `1` episode |
| - mean success `1.0` |
| - mean return `1.0` |
| - steps `94` |
| - retrieved episode index `11` |
| - retrieval similarity `0.9998629689216614` |
| - full-architecture hybrid eval with elastic controller checkpoint plus dual-push retrieval checkpoint: |
| - `1` episode |
| - mean success `1.0` |
| - mean return `1.0` |
| - steps `116` |
| - path recoveries `0` |
| - noop fallbacks `0` |
| - first selected mode `residual::maintain_opening` |
| - last selected mode `residual::base_action` |
|
|
| ## Environment Recreation |
|
|
| Environment files are under `environment/`, including: |
|
|
| - `environment/setup_same_hardware.sh` |
| - `environment/runtime_env_vars.sh` |
| - `environment/reconstruct_anybimanual_overlap_replay.sh` |
| - `environment/hardware_snapshot.txt` |
| - `environment/env_list.txt` |
| - `environment/base_python.txt` |
| - `environment/base_pip_freeze.txt` |
| - `environment/rlbench_python.txt` |
| - `environment/rlbench_pip_freeze.txt` |
|
|
| ## Notes On Result Presentation |
|
|
| This repo-level README and the new root docs intentionally keep result text raw: |
|
|
| - file paths |
| - exact commands |
| - exact numeric outputs |
| - exact partial status for in-flight runs |
|
|
| Interpretive material already present inside older staged artifacts remains preserved as part of the historical workspace contents. |
|
|