Add files using upload-large-folder tool

bfb9665 verified 15 days ago

11.1 kB

	# VLAarchtests2

	Bundle staged from `/workspace` on `2026-03-31 UTC`.

	This repo is the follow-on organization repo to `lsnu/VLAarchtests`. It includes:

	- current code under `VLAarchtests/`
	- current third-party baseline code under `third_party/`
	- current baseline runs, replay artifacts, demo roots, and released checkpoint material under `baselines/`
	- current training outputs and checkpoints under `outputs/`
	- current logs under `reports/`
	- environment recreation files under `environment/`
	- raw results and change/test logs at the repo root
	- the previous repo README under `history/VLAarchtests_previous_README.md`
	- the active handoff file under `handoff/instructions4.md`

	## Top-Level Contents

	- `VLAarchtests/`
	- code, tests, configs, generated configs, reports, checkpoints, and proxy datasets from the current runpod workspace
	- `third_party/AnyBimanual/`
	- local AnyBimanual checkout used for the official overlap baseline branch, including local compatibility patches
	- `baselines/`
	- released AnyBimanual checkpoint material
	- overlap replay artifacts
	- HF export packaging note: `baselines/AnyBimanual_overlap_replay/multi/` is sharded into subdirectories to satisfy the Hub `10000 files per directory` limit
	- overlap run directories
	- local subset3 demo roots used by the overlap branch
	- `outputs/`
	- RLBench training outputs and checkpoints used by the current anchor, RVT, dual-push, and elastic-controller branches
	- `reports/`
	- training and evaluation logs copied from `/workspace/reports`
	- `environment/`
	- machine snapshot, package lists, and setup helpers
	- `history/`
	- copied previous-repo README
	- `handoff/`
	- active sprint instruction file
	- `RESULTS_RAW.md`
	- raw result tables and final official overlap eval outputs
	- `CHANGE_AND_TEST_LOG.md`
	- file-level change log and executed test commands
	- `MODEL_AND_ARTIFACT_INDEX.md`
	- staged directory map with main artifact roots

	## Previous Repo Coverage

	The earlier `lsnu/VLAarchtests` repo covered the `2026-03-25/26` work. Its README is copied verbatim at:

	- `history/VLAarchtests_previous_README.md`

	Previous-repo items explicitly referenced there include:

	- compact, spatial, compact-phase, and spatial-phase proxy branches
	- earlier RLBench direct-policy and kNN runs
	- environment recreation files
	- prior raw result tables

	## Current Session Additions

	Current-session folders added or expanded in this repo include:

	- `VLAarchtests/artifacts/reports/sprint_v7_summary/`
	- `VLAarchtests/artifacts/reports/sprint_v7_followup/`
	- `VLAarchtests/artifacts/reports/selector_finetune_v7_iterations/`
	- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter6/`
	- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter7/`
	- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter8/`
	- `VLAarchtests/artifacts/reports/task_routed_proxy_v1/`
	- `VLAarchtests/artifacts/reports/rlbench_general_debug_20260330/`
	- `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/`
	- `VLAarchtests/artifacts/reports/bag_mode_specialization_20260330/`
	- `VLAarchtests/artifacts/reports/general_task_anchor_20260330_dual_push_buttons/`
	- `VLAarchtests/artifacts/reports/rvt_overlap_branch_20260330/`
	- `VLAarchtests/artifacts/reports/rvt_overlap_branch_fixedbounds_20260330/`
	- `VLAarchtests/artifacts/reports/dual_push_nonzero_branch_20260330/`
	- `VLAarchtests/artifacts/reports/dual_push_full_arch_hybrid_20260331/`

	## Raw Results Snapshot

	### Proxy sprint v7

	Source:

	- `VLAarchtests/artifacts/reports/sprint_v7_summary/reveal_sprint_summary_compact.json`

	Raw values:

	- base model mean success: `0.28`
	- base per-task: foliage `0.39`, bag `0.31`, cloth `0.14`
	- random mean success: `0.43333333333333335`
	- candidate0 mean success: `0.2`
	- oracle mean success: `0.4066666666666667`
	- scripted mean success: `1.0`

	### Eval-time ablations

	Source:

	- `VLAarchtests/artifacts/reports/sprint_v7_summary/reveal_sprint_summary_compact.json`

	Raw values:

	- `no_planner`: `0.2`
	- `no_memory`: `0.3233333333333333`
	- `no_task_conditioning`: `0.28`
	- `no_geometry`: `0.27`
	- `no_camera_pose`: `0.29333333333333333`

	### Selector checkpoints

	Sources:

	- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter6/default/reveal_benchmark.json`
	- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter7/full_fixed_default/reveal_benchmark.json`
	- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter8/bag_fixed_default/reveal_benchmark.json`
	- `VLAarchtests/artifacts/reports/task_routed_proxy_v1/summary.md`

	Raw values:

	- `iter6` mean success: `0.4566666666666667`
	- foliage `0.46`, bag `0.4`, cloth `0.51`
	- `iter7` mean success: `0.4666666666666666`
	- foliage `0.4`, bag `0.41`, cloth `0.59`
	- `iter8` bag-only fixed slice: `0.41`
	- routed controller mean success: `0.48666666666666664`
	- routing rule: `foliage -> iter6`, `bag -> iter8`, `cloth -> iter8`
	- per-task: foliage `0.46`, bag `0.41`, cloth `0.59`

	### Real baseline compare on proxy suite

	Source:

	- `VLAarchtests/artifacts/reports/real_baseline_compare_v7_full/reveal_benchmark.json`

	Raw values:

	- `baseline_rgbd_stage3` mean success: `0.31`
	- foliage `0.21`, bag `0.15`, cloth `0.57`
	- `iter5_selector` mean success: `0.45`
	- foliage `0.44`, bag `0.4`, cloth `0.51`

	### RLBench recovered push-box comparator

	Sources:

	- `reports/rlbench_general_debug/rlbench_push_box_fair_step1_final_knn_ep10_x99_res224_len180_train80_fixed/bimanual_push_box/rollout_eval.json`
	- `reports/rlbench_general_debug/rlbench_push_box_historical_step1_knn_ep10_x99_res224_len180_train80_fixed/bimanual_push_box/rollout_eval.json`

	Raw values:

	- current fair-step1 final mean success: `0.7`
	- current fair-step1 final successes:
	- `[1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0]`
	- historical push-box control mean success: `0.4`
	- historical push-box control successes:
	- `[0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0]`

	### Official AnyBimanual overlap branch

	Sources:

	- `baselines/AnyBimanual_overlap_runs/peract_bc_subset3_overlap_smoke200_fixpretrain_nowandb3/PERACT_BC/seed0/training.log`
	- `reports/anybimanual_subset3_overlap_resume1000_eval.log`

	Raw train milestones:

	- global step `300`: loss `40.91718`
	- global step `400`: loss `33.26684`
	- global step `500`: loss `36.07054`
	- global step `600`: loss `35.32345`
	- global step `700`: loss `28.50959`
	- global step `800`: loss `23.60169`
	- global step `900`: loss `15.28901`
	- run reached `weights/1000` and the train exited cleanly

	Raw eval outputs:

	- source log: `reports/anybimanual_subset3_overlap_resume1000_eval.log`
	- summary files:
	- `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/resume1000_summary/summary.md`
	- `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/resume1000_summary/summary.json`
	- local last complete step: `1000`
	- local mean success: `0.16`
	- local per-task success:
	- `coordinated_push_box`: `0.0`
	- `coordinated_lift_ball`: `0.0`
	- `dual_push_buttons`: `0.48`
	- local per-task return:
	- `coordinated_push_box`: `0.0`
	- `coordinated_lift_ball`: `0.0`
	- `dual_push_buttons`: `12.0`
	- public best overlap step in the local summary: `60000`
	- public best mean success in the local summary: `0.6933333333333334`

	### Validated general-task anchor: `dual_push_buttons`

	Sources:

	- `VLAarchtests/artifacts/reports/general_task_anchor_20260330_dual_push_buttons/summary.json`
	- `baselines/AnyBimanual_release_eval_anchor/perlf_release_dual_push_buttons_ep25/PERACT_BC/seed0/eval_data.csv`

	Raw values:

	- public AnyBimanual release, step `60000`: success `0.96`, return `24.0`, length `21.56`
	- local official single-task eval, step `60000`, `25` episodes: success `0.96`, return `24.0`, length `21.84`
	- local clip backbone-only result on same task: success `0.0`, return `0.0`
	- local elastic reveal proxy iter6 result on same task: success `0.0`, return `0.0`
	- local RVT frozen fixed-bounds result on same task: success `0.0`, return `0.0`

	### RVT overlap branch

	Sources:

	- `VLAarchtests/artifacts/reports/rvt_overlap_branch_20260330/summary.md`
	- `VLAarchtests/artifacts/reports/rvt_overlap_branch_fixedbounds_20260330/summary.md`

	Raw values:

	- frozen RVT stage1 train summary:
	- `outputs/rlbench_rvt_branch/rlbench_subset3_backbone_only_rvt_100demo_frozen_seed17/summary.json`
	- final train total `0.043179353826920445`
	- final val total `0.039591669984665984`
	- frozen RVT overlap eval: mean success `0.0`
	- frozen fixed-bounds RVT overlap eval: mean success `0.0`
	- both branch gates:
	- local AnyBimanual overlap floor `0.16`
	- stage2 run `false`

	### Dual-push non-privileged retarget branch

	Sources:

	- `VLAarchtests/artifacts/reports/dual_push_nonzero_branch_20260330/summary.md`

	Raw values:

	- demo replay through `absolute_action_from_delta`:
	- `reports/dual_push_nonzero_branch_20260330/demo_replay/replay_summary.json`
	- mean success `0.8`
	- mean return `0.8`
	- retargeted demo with checkpoint backbone retrieval and vision-only button localization:
	- `reports/dual_push_nonzero_branch_20260330/retargeted_demo_backbone_vision_ep1/summary.json`
	- mean success `1.0`
	- mean return `1.0`
	- retargeted demo with checkpoint backbone retrieval and vision-only button localization:
	- `reports/dual_push_nonzero_branch_20260330/retargeted_demo_backbone_vision_ep5/summary.json`
	- mean success `1.0`
	- mean return `1.0`

	### Dual-push full-architecture hybrid branch

	Sources:

	- `VLAarchtests/artifacts/reports/dual_push_full_arch_hybrid_20260331/summary.md`
	- `reports/dual_push_full_arch_probe_iter6_scene_ep1/summary.json`
	- `reports/dual_push_full_arch_hybrid_iter6_backbone_ep1/summary.json`

	Raw values:

	- elastic checkpoint retargeted-demo probe with scene retrieval and vision-only button localization:
	- `1` episode
	- mean success `1.0`
	- mean return `1.0`
	- steps `94`
	- retrieved episode index `11`
	- retrieval similarity `0.9998629689216614`
	- full-architecture hybrid eval with elastic controller checkpoint plus dual-push retrieval checkpoint:
	- `1` episode
	- mean success `1.0`
	- mean return `1.0`
	- steps `116`
	- path recoveries `0`
	- noop fallbacks `0`
	- first selected mode `residual::maintain_opening`
	- last selected mode `residual::base_action`

	## Environment Recreation

	Environment files are under `environment/`, including:

	- `environment/setup_same_hardware.sh`
	- `environment/runtime_env_vars.sh`
	- `environment/reconstruct_anybimanual_overlap_replay.sh`
	- `environment/hardware_snapshot.txt`
	- `environment/env_list.txt`
	- `environment/base_python.txt`
	- `environment/base_pip_freeze.txt`
	- `environment/rlbench_python.txt`
	- `environment/rlbench_pip_freeze.txt`

	## Notes On Result Presentation

	This repo-level README and the new root docs intentionally keep result text raw:

	- file paths
	- exact commands
	- exact numeric outputs
	- exact partial status for in-flight runs

	Interpretive material already present inside older staged artifacts remains preserved as part of the historical workspace contents.