Change And Test Log

This file records the main code changes and executed test commands copied into this repo. Result statements below are raw command outcomes only.

Previous Repo Work Included Here

Copied from history/VLAarchtests_previous_README.md:

core model, memory, planner, and dataset changes under:
- VLAarchtests/code/reveal_vla_bimanual/models/
- VLAarchtests/code/reveal_vla_bimanual/train/losses.py
- VLAarchtests/code/reveal_vla_bimanual/sim_reveal/
- VLAarchtests/code/reveal_vla_bimanual/sim_rlbench/dataset.py
training and eval paths under:
- VLAarchtests/code/reveal_vla_bimanual/train/
- VLAarchtests/code/reveal_vla_bimanual/eval/
earlier test suite under:
- VLAarchtests/tests/

Current Session File Changes

Core reveal/proxy path

VLAarchtests/code/reveal_vla_bimanual/models/policy.py
VLAarchtests/code/reveal_vla_bimanual/models/action_decoder.py
VLAarchtests/code/reveal_vla_bimanual/models/backbones.py
VLAarchtests/code/reveal_vla_bimanual/models/rvt_backbone.py
VLAarchtests/code/reveal_vla_bimanual/train/losses.py
VLAarchtests/code/reveal_vla_bimanual/train/run_rlbench_experiment.py
VLAarchtests/code/reveal_vla_bimanual/eval/run_reveal_benchmark.py
VLAarchtests/code/reveal_vla_bimanual/eval/summarize_anybimanual_overlap_eval.py
VLAarchtests/code/reveal_vla_bimanual/eval/summarize_rvt_overlap_branch.py
VLAarchtests/code/reveal_vla_bimanual/eval/compose_task_routed_proxy_summary.py
VLAarchtests/code/reveal_vla_bimanual/eval/run_proposal_alignment_diagnostics.py
VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_knn_eval.py
VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_knn_task_sweep.py
VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_dual_push_retargeted_demo_eval.py
VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_dual_push_full_arch_hybrid_eval.py
VLAarchtests/code/reveal_vla_bimanual/eval/dual_push_retarget_utils.py
VLAarchtests/code/reveal_vla_bimanual/eval/dual_push_full_arch_utils.py
VLAarchtests/code/reveal_vla_bimanual/sim_reveal/build_task_specialized_episode_specs.py
VLAarchtests/code/reveal_vla_bimanual/sim_reveal/procedural_envs.py
VLAarchtests/code/reveal_vla_bimanual/sim_rlbench/task_resolver.py

Training/eval wrappers and configs

VLAarchtests/code/reveal_vla_bimanual/scripts/run_task_routed_proxy_eval.sh
VLAarchtests/code/reveal_vla_bimanual/scripts/run_bag_selector_iter9.sh
VLAarchtests/code/reveal_vla_bimanual/scripts/run_anybimanual_subset3_overlap_train.sh
VLAarchtests/code/reveal_vla_bimanual/scripts/run_anybimanual_subset3_overlap_eval.sh
VLAarchtests/code/reveal_vla_bimanual/scripts/run_rvt_overlap_branch.sh
VLAarchtests/code/reveal_vla_bimanual/scripts/run_dual_push_retargeted_demo_eval.sh
VLAarchtests/code/reveal_vla_bimanual/scripts/run_dual_push_full_arch_hybrid_eval.sh
VLAarchtests/code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_v7_selector_finetune_iter6.yaml
VLAarchtests/code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_v7_selector_finetune_iter7.yaml
VLAarchtests/code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_v7_selector_finetune_iter8.yaml
VLAarchtests/code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_v7_selector_finetune_iter9_bag.yaml
VLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_subset3_backbone_only_clip_100demo_fair_step1_full.yaml
VLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_subset3_backbone_only_rvt_100demo_frozen_seed17.yaml
VLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_subset3_backbone_only_rvt_100demo_unfreeze_top2_seed17.yaml
VLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_dual_push_backbone_only_clip_finetune_seed17.yaml
VLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_dual_push_backbone_only_clip_finetune_weighted_seed17.yaml
VLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_dual_push_backbone_only_clip_chunk8_weighted_seed17.yaml
environment/reconstruct_anybimanual_overlap_replay.sh

Test additions or updates

VLAarchtests/tests/test_eval_toggle_paths_work.py
VLAarchtests/tests/test_task_routed_model_eval.py
VLAarchtests/tests/test_anybimanual_resume_logic.py
VLAarchtests/tests/test_anybimanual_overlap_eval_summary.py
VLAarchtests/tests/test_candidate_ranking_loss.py
VLAarchtests/tests/test_compose_task_routed_proxy_summary.py
VLAarchtests/tests/test_build_task_specialized_episode_specs.py
VLAarchtests/tests/test_proposal_mode_names_label_base_action.py
VLAarchtests/tests/test_proxy_scripted_bench.py
VLAarchtests/tests/test_rvt_backbone_forward.py
VLAarchtests/tests/test_rlbench_dataset_rgbd_geometry.py
VLAarchtests/tests/test_rlbench_init_checkpoint.py
VLAarchtests/tests/test_rlbench_pickle_bootstrap.py
VLAarchtests/tests/test_rlbench_task_resolver_aliases.py
VLAarchtests/tests/test_summarize_rvt_overlap_branch.py
VLAarchtests/tests/test_dual_push_retarget_utils.py
VLAarchtests/tests/test_dual_push_full_arch_utils.py

Third-party baseline path changes

third_party/AnyBimanual/third_party/YARR/yarr/runners/offline_train_runner.py
third_party/AnyBimanual/third_party/YARR/yarr/runners/weight_init_utils.py
third_party/AnyBimanual/agents/peract_bc/launch_utils.py
third_party/AnyBimanual/agents/peract_bc/qattention_peract_bc_agent.py
third_party/AnyBimanual/agents/peract_bimanual/qattention_peract_bc_agent.py

Current Session Test Commands

Executed commands recorded in the workspace:

python -m py_compile /workspace/VLAarchtests/code/reveal_vla_bimanual/models/action_decoder.py /workspace/VLAarchtests/tests/test_proposal_mode_names_label_base_action.py
PYTHONPATH=/workspace/VLAarchtests/code/reveal_vla_bimanual pytest -q /workspace/VLAarchtests/tests/test_proposal_mode_names_label_base_action.py /workspace/VLAarchtests/tests/test_candidate_ranking_loss.py /workspace/VLAarchtests/tests/test_compose_task_routed_proxy_summary.py /workspace/VLAarchtests/tests/test_build_task_specialized_episode_specs.py
- result: 11 passed
pytest -q /workspace/VLAarchtests/tests/test_anybimanual_overlap_eval_summary.py
- result: 2 passed
pytest -q /workspace/VLAarchtests/tests/test_task_routed_model_eval.py /workspace/VLAarchtests/tests/test_eval_toggle_paths_work.py
- result: 4 passed
pytest -q /workspace/VLAarchtests/tests/test_rvt_backbone_forward.py /workspace/VLAarchtests/tests/test_rlbench_dataset_rgbd_geometry.py /workspace/VLAarchtests/tests/test_eval_toggle_paths_work.py /workspace/VLAarchtests/tests/test_rlbench_init_checkpoint.py /workspace/VLAarchtests/tests/test_rlbench_pickle_bootstrap.py /workspace/VLAarchtests/tests/test_rlbench_task_resolver_aliases.py /workspace/VLAarchtests/tests/test_summarize_rvt_overlap_branch.py
- result: passed
pytest -q /workspace/VLAarchtests/tests/test_build_task_specialized_episode_specs.py /workspace/VLAarchtests/tests/test_candidate_ranking_loss.py /workspace/VLAarchtests/tests/test_compose_task_routed_proxy_summary.py
- result: 10 passed
pytest -q /workspace/VLAarchtests/tests/test_dual_push_retarget_utils.py /workspace/VLAarchtests/tests/test_rlbench_knn_eval_scene_kwargs.py
- result: passed
pytest -q /workspace/VLAarchtests/tests/test_dual_push_retarget_utils.py
- result: 6 passed
pytest -q /workspace/VLAarchtests/tests/test_dual_push_retarget_utils.py /workspace/VLAarchtests/tests/test_dual_push_full_arch_utils.py
- result: 9 passed
bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_bag_selector_iter9.sh
bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_task_routed_proxy_eval.sh
bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_anybimanual_subset3_overlap_train.sh
bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_rvt_overlap_branch.sh
bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_dual_push_retargeted_demo_eval.sh
bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_dual_push_full_arch_hybrid_eval.sh
PYTHONPATH=/workspace/third_party/AnyBimanual/third_party/YARR pytest -q /workspace/VLAarchtests/tests/test_anybimanual_resume_logic.py
- result: 4 passed
python -m py_compile /workspace/VLAarchtests/code/reveal_vla_bimanual/models/rvt_backbone.py /workspace/VLAarchtests/code/reveal_vla_bimanual/train/run_rlbench_experiment.py /workspace/VLAarchtests/code/reveal_vla_bimanual/sim_rlbench/dataset.py /workspace/VLAarchtests/code/reveal_vla_bimanual/sim_rlbench/task_resolver.py /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/summarize_rvt_overlap_branch.py
- result: passed
python -m py_compile /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/dual_push_retarget_utils.py /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_dual_push_retargeted_demo_eval.py
- result: passed
python -m py_compile /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/dual_push_retarget_utils.py /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/dual_push_full_arch_utils.py /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_dual_push_full_arch_hybrid_eval.py
- result: passed

Current Session Generated Reports

Current-session report roots staged in this repo:

VLAarchtests/artifacts/reports/sprint_v7_summary/
VLAarchtests/artifacts/reports/sprint_v7_followup/
VLAarchtests/artifacts/reports/selector_finetune_v7_iterations/
VLAarchtests/artifacts/reports/selector_finetune_v7_iter6/
VLAarchtests/artifacts/reports/selector_finetune_v7_iter7/
VLAarchtests/artifacts/reports/selector_finetune_v7_iter8/
VLAarchtests/artifacts/reports/task_routed_proxy_v1/
VLAarchtests/artifacts/reports/rlbench_general_debug_20260330/
VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/
VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/resume1000_summary/
VLAarchtests/artifacts/reports/bag_mode_specialization_20260330/
VLAarchtests/artifacts/reports/general_task_anchor_20260330_dual_push_buttons/
VLAarchtests/artifacts/reports/rvt_overlap_branch_20260330/
VLAarchtests/artifacts/reports/rvt_overlap_branch_fixedbounds_20260330/
VLAarchtests/artifacts/reports/dual_push_nonzero_branch_20260330/
VLAarchtests/artifacts/reports/dual_push_full_arch_hybrid_20260331/

HF Packaging Notes

Raw packaging changes applied to the staged HF export:

baselines/AnyBimanual_overlap_replay/multi/ was reshaped from one flat directory into shard subdirectories:
- 00000-04999/
- 05000-09999/
- 10000-14999/
file count after reshape: 14034
reconstruction helper added at:
- environment/reconstruct_anybimanual_overlap_replay.sh
exact rejected Hub error before reshape:
- Your push was rejected because it contains too many files per directory. Each directory in your git repo can only contain up to 10000 files. Offending directories: /baselines/AnyBimanual_overlap_replay/multi/

Current Session Logs

Main logs staged in this repo:

reports/anybimanual_subset3_overlap_smoke200_fixpretrain_nowandb3_train.log
reports/anybimanual_subset3_overlap_smoke200_fixpretrain_nowandb3_train_presavefix.log
reports/anybimanual_subset3_overlap_resume1000_eval.log
reports/anybimanual_subset3_overlap_resume1000_summary.log
reports/task_routed_proxy_v1_rerun.log
reports/run_bag_selector_iter9_prebuild.log
reports/anybimanual_release_subset3_eval_ep5.log
reports/rvt_overlap_branch_fixedbounds_20260330_chain.sh
reports/dual_push_full_arch_hybrid_iter6_scene_ep5.log
reports/dual_push_full_arch_hybrid_iter6_backbone_ep2_r005.log

Official Overlap Eval Final Raw Outputs

Sources:

reports/anybimanual_subset3_overlap_resume1000_eval.log
VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/resume1000_summary/summary.json

Raw values:

step 1000
local mean success 0.16
coordinated_push_box: success 0.0, return 0.0
coordinated_lift_ball: success 0.0, return 0.0
dual_push_buttons: success 0.48, return 12.0

General-Task Anchor Raw Outputs

Sources:

VLAarchtests/artifacts/reports/general_task_anchor_20260330_dual_push_buttons/summary.json

Raw values:

public AnyBimanual release, step 60000: success 0.96, return 24.0, length 21.56
local official single-task eval, step 60000, 25 episodes: success 0.96, return 24.0, length 21.84
local clip backbone-only result: success 0.0, return 0.0
local elastic reveal proxy iter6 result: success 0.0, return 0.0
local RVT frozen fixed-bounds result: success 0.0, return 0.0

Dual-Push Branch Raw Outputs

Sources:

VLAarchtests/artifacts/reports/dual_push_nonzero_branch_20260330/summary.md
VLAarchtests/artifacts/reports/dual_push_full_arch_hybrid_20260331/summary.md

Raw values:

demo replay through absolute_action_from_delta: mean success 0.8, mean return 0.8
retargeted demo with checkpoint backbone retrieval and vision-only button localization, 5 episodes: mean success 1.0, mean return 1.0
elastic checkpoint retargeted-demo probe with scene retrieval and vision-only button localization, 1 episode: mean success 1.0, mean return 1.0
full-architecture hybrid eval with elastic controller checkpoint plus dual-push retrieval checkpoint, 1 episode: mean success 1.0, mean return 1.0, steps 116, path recoveries 0, noop fallbacks 0