File size: 11,073 Bytes
bfb9665 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 | # VLAarchtests2
Bundle staged from `/workspace` on `2026-03-31 UTC`.
This repo is the follow-on organization repo to `lsnu/VLAarchtests`. It includes:
- current code under `VLAarchtests/`
- current third-party baseline code under `third_party/`
- current baseline runs, replay artifacts, demo roots, and released checkpoint material under `baselines/`
- current training outputs and checkpoints under `outputs/`
- current logs under `reports/`
- environment recreation files under `environment/`
- raw results and change/test logs at the repo root
- the previous repo README under `history/VLAarchtests_previous_README.md`
- the active handoff file under `handoff/instructions4.md`
## Top-Level Contents
- `VLAarchtests/`
- code, tests, configs, generated configs, reports, checkpoints, and proxy datasets from the current runpod workspace
- `third_party/AnyBimanual/`
- local AnyBimanual checkout used for the official overlap baseline branch, including local compatibility patches
- `baselines/`
- released AnyBimanual checkpoint material
- overlap replay artifacts
- HF export packaging note: `baselines/AnyBimanual_overlap_replay/multi/` is sharded into subdirectories to satisfy the Hub `10000 files per directory` limit
- overlap run directories
- local subset3 demo roots used by the overlap branch
- `outputs/`
- RLBench training outputs and checkpoints used by the current anchor, RVT, dual-push, and elastic-controller branches
- `reports/`
- training and evaluation logs copied from `/workspace/reports`
- `environment/`
- machine snapshot, package lists, and setup helpers
- `history/`
- copied previous-repo README
- `handoff/`
- active sprint instruction file
- `RESULTS_RAW.md`
- raw result tables and final official overlap eval outputs
- `CHANGE_AND_TEST_LOG.md`
- file-level change log and executed test commands
- `MODEL_AND_ARTIFACT_INDEX.md`
- staged directory map with main artifact roots
## Previous Repo Coverage
The earlier `lsnu/VLAarchtests` repo covered the `2026-03-25/26` work. Its README is copied verbatim at:
- `history/VLAarchtests_previous_README.md`
Previous-repo items explicitly referenced there include:
- compact, spatial, compact-phase, and spatial-phase proxy branches
- earlier RLBench direct-policy and kNN runs
- environment recreation files
- prior raw result tables
## Current Session Additions
Current-session folders added or expanded in this repo include:
- `VLAarchtests/artifacts/reports/sprint_v7_summary/`
- `VLAarchtests/artifacts/reports/sprint_v7_followup/`
- `VLAarchtests/artifacts/reports/selector_finetune_v7_iterations/`
- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter6/`
- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter7/`
- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter8/`
- `VLAarchtests/artifacts/reports/task_routed_proxy_v1/`
- `VLAarchtests/artifacts/reports/rlbench_general_debug_20260330/`
- `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/`
- `VLAarchtests/artifacts/reports/bag_mode_specialization_20260330/`
- `VLAarchtests/artifacts/reports/general_task_anchor_20260330_dual_push_buttons/`
- `VLAarchtests/artifacts/reports/rvt_overlap_branch_20260330/`
- `VLAarchtests/artifacts/reports/rvt_overlap_branch_fixedbounds_20260330/`
- `VLAarchtests/artifacts/reports/dual_push_nonzero_branch_20260330/`
- `VLAarchtests/artifacts/reports/dual_push_full_arch_hybrid_20260331/`
## Raw Results Snapshot
### Proxy sprint v7
Source:
- `VLAarchtests/artifacts/reports/sprint_v7_summary/reveal_sprint_summary_compact.json`
Raw values:
- base model mean success: `0.28`
- base per-task: foliage `0.39`, bag `0.31`, cloth `0.14`
- random mean success: `0.43333333333333335`
- candidate0 mean success: `0.2`
- oracle mean success: `0.4066666666666667`
- scripted mean success: `1.0`
### Eval-time ablations
Source:
- `VLAarchtests/artifacts/reports/sprint_v7_summary/reveal_sprint_summary_compact.json`
Raw values:
- `no_planner`: `0.2`
- `no_memory`: `0.3233333333333333`
- `no_task_conditioning`: `0.28`
- `no_geometry`: `0.27`
- `no_camera_pose`: `0.29333333333333333`
### Selector checkpoints
Sources:
- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter6/default/reveal_benchmark.json`
- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter7/full_fixed_default/reveal_benchmark.json`
- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter8/bag_fixed_default/reveal_benchmark.json`
- `VLAarchtests/artifacts/reports/task_routed_proxy_v1/summary.md`
Raw values:
- `iter6` mean success: `0.4566666666666667`
- foliage `0.46`, bag `0.4`, cloth `0.51`
- `iter7` mean success: `0.4666666666666666`
- foliage `0.4`, bag `0.41`, cloth `0.59`
- `iter8` bag-only fixed slice: `0.41`
- routed controller mean success: `0.48666666666666664`
- routing rule: `foliage -> iter6`, `bag -> iter8`, `cloth -> iter8`
- per-task: foliage `0.46`, bag `0.41`, cloth `0.59`
### Real baseline compare on proxy suite
Source:
- `VLAarchtests/artifacts/reports/real_baseline_compare_v7_full/reveal_benchmark.json`
Raw values:
- `baseline_rgbd_stage3` mean success: `0.31`
- foliage `0.21`, bag `0.15`, cloth `0.57`
- `iter5_selector` mean success: `0.45`
- foliage `0.44`, bag `0.4`, cloth `0.51`
### RLBench recovered push-box comparator
Sources:
- `reports/rlbench_general_debug/rlbench_push_box_fair_step1_final_knn_ep10_x99_res224_len180_train80_fixed/bimanual_push_box/rollout_eval.json`
- `reports/rlbench_general_debug/rlbench_push_box_historical_step1_knn_ep10_x99_res224_len180_train80_fixed/bimanual_push_box/rollout_eval.json`
Raw values:
- current fair-step1 final mean success: `0.7`
- current fair-step1 final successes:
- `[1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0]`
- historical push-box control mean success: `0.4`
- historical push-box control successes:
- `[0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0]`
### Official AnyBimanual overlap branch
Sources:
- `baselines/AnyBimanual_overlap_runs/peract_bc_subset3_overlap_smoke200_fixpretrain_nowandb3/PERACT_BC/seed0/training.log`
- `reports/anybimanual_subset3_overlap_resume1000_eval.log`
Raw train milestones:
- global step `300`: loss `40.91718`
- global step `400`: loss `33.26684`
- global step `500`: loss `36.07054`
- global step `600`: loss `35.32345`
- global step `700`: loss `28.50959`
- global step `800`: loss `23.60169`
- global step `900`: loss `15.28901`
- run reached `weights/1000` and the train exited cleanly
Raw eval outputs:
- source log: `reports/anybimanual_subset3_overlap_resume1000_eval.log`
- summary files:
- `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/resume1000_summary/summary.md`
- `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/resume1000_summary/summary.json`
- local last complete step: `1000`
- local mean success: `0.16`
- local per-task success:
- `coordinated_push_box`: `0.0`
- `coordinated_lift_ball`: `0.0`
- `dual_push_buttons`: `0.48`
- local per-task return:
- `coordinated_push_box`: `0.0`
- `coordinated_lift_ball`: `0.0`
- `dual_push_buttons`: `12.0`
- public best overlap step in the local summary: `60000`
- public best mean success in the local summary: `0.6933333333333334`
### Validated general-task anchor: `dual_push_buttons`
Sources:
- `VLAarchtests/artifacts/reports/general_task_anchor_20260330_dual_push_buttons/summary.json`
- `baselines/AnyBimanual_release_eval_anchor/perlf_release_dual_push_buttons_ep25/PERACT_BC/seed0/eval_data.csv`
Raw values:
- public AnyBimanual release, step `60000`: success `0.96`, return `24.0`, length `21.56`
- local official single-task eval, step `60000`, `25` episodes: success `0.96`, return `24.0`, length `21.84`
- local clip backbone-only result on same task: success `0.0`, return `0.0`
- local elastic reveal proxy iter6 result on same task: success `0.0`, return `0.0`
- local RVT frozen fixed-bounds result on same task: success `0.0`, return `0.0`
### RVT overlap branch
Sources:
- `VLAarchtests/artifacts/reports/rvt_overlap_branch_20260330/summary.md`
- `VLAarchtests/artifacts/reports/rvt_overlap_branch_fixedbounds_20260330/summary.md`
Raw values:
- frozen RVT stage1 train summary:
- `outputs/rlbench_rvt_branch/rlbench_subset3_backbone_only_rvt_100demo_frozen_seed17/summary.json`
- final train total `0.043179353826920445`
- final val total `0.039591669984665984`
- frozen RVT overlap eval: mean success `0.0`
- frozen fixed-bounds RVT overlap eval: mean success `0.0`
- both branch gates:
- local AnyBimanual overlap floor `0.16`
- stage2 run `false`
### Dual-push non-privileged retarget branch
Sources:
- `VLAarchtests/artifacts/reports/dual_push_nonzero_branch_20260330/summary.md`
Raw values:
- demo replay through `absolute_action_from_delta`:
- `reports/dual_push_nonzero_branch_20260330/demo_replay/replay_summary.json`
- mean success `0.8`
- mean return `0.8`
- retargeted demo with checkpoint backbone retrieval and vision-only button localization:
- `reports/dual_push_nonzero_branch_20260330/retargeted_demo_backbone_vision_ep1/summary.json`
- mean success `1.0`
- mean return `1.0`
- retargeted demo with checkpoint backbone retrieval and vision-only button localization:
- `reports/dual_push_nonzero_branch_20260330/retargeted_demo_backbone_vision_ep5/summary.json`
- mean success `1.0`
- mean return `1.0`
### Dual-push full-architecture hybrid branch
Sources:
- `VLAarchtests/artifacts/reports/dual_push_full_arch_hybrid_20260331/summary.md`
- `reports/dual_push_full_arch_probe_iter6_scene_ep1/summary.json`
- `reports/dual_push_full_arch_hybrid_iter6_backbone_ep1/summary.json`
Raw values:
- elastic checkpoint retargeted-demo probe with scene retrieval and vision-only button localization:
- `1` episode
- mean success `1.0`
- mean return `1.0`
- steps `94`
- retrieved episode index `11`
- retrieval similarity `0.9998629689216614`
- full-architecture hybrid eval with elastic controller checkpoint plus dual-push retrieval checkpoint:
- `1` episode
- mean success `1.0`
- mean return `1.0`
- steps `116`
- path recoveries `0`
- noop fallbacks `0`
- first selected mode `residual::maintain_opening`
- last selected mode `residual::base_action`
## Environment Recreation
Environment files are under `environment/`, including:
- `environment/setup_same_hardware.sh`
- `environment/runtime_env_vars.sh`
- `environment/reconstruct_anybimanual_overlap_replay.sh`
- `environment/hardware_snapshot.txt`
- `environment/env_list.txt`
- `environment/base_python.txt`
- `environment/base_pip_freeze.txt`
- `environment/rlbench_python.txt`
- `environment/rlbench_pip_freeze.txt`
## Notes On Result Presentation
This repo-level README and the new root docs intentionally keep result text raw:
- file paths
- exact commands
- exact numeric outputs
- exact partial status for in-flight runs
Interpretive material already present inside older staged artifacts remains preserved as part of the historical workspace contents.
|