Change And Test Log
This file records the main code changes and executed test commands copied into this repo. Result statements below are raw command outcomes only.
Previous Repo Work Included Here
Copied from history/VLAarchtests_previous_README.md:
- core model, memory, planner, and dataset changes under:
VLAarchtests/code/reveal_vla_bimanual/models/VLAarchtests/code/reveal_vla_bimanual/train/losses.pyVLAarchtests/code/reveal_vla_bimanual/sim_reveal/VLAarchtests/code/reveal_vla_bimanual/sim_rlbench/dataset.py
- training and eval paths under:
VLAarchtests/code/reveal_vla_bimanual/train/VLAarchtests/code/reveal_vla_bimanual/eval/
- earlier test suite under:
VLAarchtests/tests/
Current Session File Changes
Core reveal/proxy path
VLAarchtests/code/reveal_vla_bimanual/models/policy.pyVLAarchtests/code/reveal_vla_bimanual/models/action_decoder.pyVLAarchtests/code/reveal_vla_bimanual/models/backbones.pyVLAarchtests/code/reveal_vla_bimanual/models/rvt_backbone.pyVLAarchtests/code/reveal_vla_bimanual/train/losses.pyVLAarchtests/code/reveal_vla_bimanual/train/run_rlbench_experiment.pyVLAarchtests/code/reveal_vla_bimanual/eval/run_reveal_benchmark.pyVLAarchtests/code/reveal_vla_bimanual/eval/summarize_anybimanual_overlap_eval.pyVLAarchtests/code/reveal_vla_bimanual/eval/summarize_rvt_overlap_branch.pyVLAarchtests/code/reveal_vla_bimanual/eval/compose_task_routed_proxy_summary.pyVLAarchtests/code/reveal_vla_bimanual/eval/run_proposal_alignment_diagnostics.pyVLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_knn_eval.pyVLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_knn_task_sweep.pyVLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_dual_push_retargeted_demo_eval.pyVLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_dual_push_full_arch_hybrid_eval.pyVLAarchtests/code/reveal_vla_bimanual/eval/dual_push_retarget_utils.pyVLAarchtests/code/reveal_vla_bimanual/eval/dual_push_full_arch_utils.pyVLAarchtests/code/reveal_vla_bimanual/sim_reveal/build_task_specialized_episode_specs.pyVLAarchtests/code/reveal_vla_bimanual/sim_reveal/procedural_envs.pyVLAarchtests/code/reveal_vla_bimanual/sim_rlbench/task_resolver.py
Training/eval wrappers and configs
VLAarchtests/code/reveal_vla_bimanual/scripts/run_task_routed_proxy_eval.shVLAarchtests/code/reveal_vla_bimanual/scripts/run_bag_selector_iter9.shVLAarchtests/code/reveal_vla_bimanual/scripts/run_anybimanual_subset3_overlap_train.shVLAarchtests/code/reveal_vla_bimanual/scripts/run_anybimanual_subset3_overlap_eval.shVLAarchtests/code/reveal_vla_bimanual/scripts/run_rvt_overlap_branch.shVLAarchtests/code/reveal_vla_bimanual/scripts/run_dual_push_retargeted_demo_eval.shVLAarchtests/code/reveal_vla_bimanual/scripts/run_dual_push_full_arch_hybrid_eval.shVLAarchtests/code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_v7_selector_finetune_iter6.yamlVLAarchtests/code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_v7_selector_finetune_iter7.yamlVLAarchtests/code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_v7_selector_finetune_iter8.yamlVLAarchtests/code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_v7_selector_finetune_iter9_bag.yamlVLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_subset3_backbone_only_clip_100demo_fair_step1_full.yamlVLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_subset3_backbone_only_rvt_100demo_frozen_seed17.yamlVLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_subset3_backbone_only_rvt_100demo_unfreeze_top2_seed17.yamlVLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_dual_push_backbone_only_clip_finetune_seed17.yamlVLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_dual_push_backbone_only_clip_finetune_weighted_seed17.yamlVLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_dual_push_backbone_only_clip_chunk8_weighted_seed17.yamlenvironment/reconstruct_anybimanual_overlap_replay.sh
Test additions or updates
VLAarchtests/tests/test_eval_toggle_paths_work.pyVLAarchtests/tests/test_task_routed_model_eval.pyVLAarchtests/tests/test_anybimanual_resume_logic.pyVLAarchtests/tests/test_anybimanual_overlap_eval_summary.pyVLAarchtests/tests/test_candidate_ranking_loss.pyVLAarchtests/tests/test_compose_task_routed_proxy_summary.pyVLAarchtests/tests/test_build_task_specialized_episode_specs.pyVLAarchtests/tests/test_proposal_mode_names_label_base_action.pyVLAarchtests/tests/test_proxy_scripted_bench.pyVLAarchtests/tests/test_rvt_backbone_forward.pyVLAarchtests/tests/test_rlbench_dataset_rgbd_geometry.pyVLAarchtests/tests/test_rlbench_init_checkpoint.pyVLAarchtests/tests/test_rlbench_pickle_bootstrap.pyVLAarchtests/tests/test_rlbench_task_resolver_aliases.pyVLAarchtests/tests/test_summarize_rvt_overlap_branch.pyVLAarchtests/tests/test_dual_push_retarget_utils.pyVLAarchtests/tests/test_dual_push_full_arch_utils.py
Third-party baseline path changes
third_party/AnyBimanual/third_party/YARR/yarr/runners/offline_train_runner.pythird_party/AnyBimanual/third_party/YARR/yarr/runners/weight_init_utils.pythird_party/AnyBimanual/agents/peract_bc/launch_utils.pythird_party/AnyBimanual/agents/peract_bc/qattention_peract_bc_agent.pythird_party/AnyBimanual/agents/peract_bimanual/qattention_peract_bc_agent.py
Current Session Test Commands
Executed commands recorded in the workspace:
python -m py_compile /workspace/VLAarchtests/code/reveal_vla_bimanual/models/action_decoder.py /workspace/VLAarchtests/tests/test_proposal_mode_names_label_base_action.pyPYTHONPATH=/workspace/VLAarchtests/code/reveal_vla_bimanual pytest -q /workspace/VLAarchtests/tests/test_proposal_mode_names_label_base_action.py /workspace/VLAarchtests/tests/test_candidate_ranking_loss.py /workspace/VLAarchtests/tests/test_compose_task_routed_proxy_summary.py /workspace/VLAarchtests/tests/test_build_task_specialized_episode_specs.py- result:
11 passed
- result:
pytest -q /workspace/VLAarchtests/tests/test_anybimanual_overlap_eval_summary.py- result:
2 passed
- result:
pytest -q /workspace/VLAarchtests/tests/test_task_routed_model_eval.py /workspace/VLAarchtests/tests/test_eval_toggle_paths_work.py- result:
4 passed
- result:
pytest -q /workspace/VLAarchtests/tests/test_rvt_backbone_forward.py /workspace/VLAarchtests/tests/test_rlbench_dataset_rgbd_geometry.py /workspace/VLAarchtests/tests/test_eval_toggle_paths_work.py /workspace/VLAarchtests/tests/test_rlbench_init_checkpoint.py /workspace/VLAarchtests/tests/test_rlbench_pickle_bootstrap.py /workspace/VLAarchtests/tests/test_rlbench_task_resolver_aliases.py /workspace/VLAarchtests/tests/test_summarize_rvt_overlap_branch.py- result:
passed
- result:
pytest -q /workspace/VLAarchtests/tests/test_build_task_specialized_episode_specs.py /workspace/VLAarchtests/tests/test_candidate_ranking_loss.py /workspace/VLAarchtests/tests/test_compose_task_routed_proxy_summary.py- result:
10 passed
- result:
pytest -q /workspace/VLAarchtests/tests/test_dual_push_retarget_utils.py /workspace/VLAarchtests/tests/test_rlbench_knn_eval_scene_kwargs.py- result:
passed
- result:
pytest -q /workspace/VLAarchtests/tests/test_dual_push_retarget_utils.py- result:
6 passed
- result:
pytest -q /workspace/VLAarchtests/tests/test_dual_push_retarget_utils.py /workspace/VLAarchtests/tests/test_dual_push_full_arch_utils.py- result:
9 passed
- result:
bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_bag_selector_iter9.shbash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_task_routed_proxy_eval.shbash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_anybimanual_subset3_overlap_train.shbash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_rvt_overlap_branch.shbash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_dual_push_retargeted_demo_eval.shbash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_dual_push_full_arch_hybrid_eval.shPYTHONPATH=/workspace/third_party/AnyBimanual/third_party/YARR pytest -q /workspace/VLAarchtests/tests/test_anybimanual_resume_logic.py- result:
4 passed
- result:
python -m py_compile /workspace/VLAarchtests/code/reveal_vla_bimanual/models/rvt_backbone.py /workspace/VLAarchtests/code/reveal_vla_bimanual/train/run_rlbench_experiment.py /workspace/VLAarchtests/code/reveal_vla_bimanual/sim_rlbench/dataset.py /workspace/VLAarchtests/code/reveal_vla_bimanual/sim_rlbench/task_resolver.py /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/summarize_rvt_overlap_branch.py- result:
passed
- result:
python -m py_compile /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/dual_push_retarget_utils.py /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_dual_push_retargeted_demo_eval.py- result:
passed
- result:
python -m py_compile /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/dual_push_retarget_utils.py /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/dual_push_full_arch_utils.py /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_dual_push_full_arch_hybrid_eval.py- result:
passed
- result:
Current Session Generated Reports
Current-session report roots staged in this repo:
VLAarchtests/artifacts/reports/sprint_v7_summary/VLAarchtests/artifacts/reports/sprint_v7_followup/VLAarchtests/artifacts/reports/selector_finetune_v7_iterations/VLAarchtests/artifacts/reports/selector_finetune_v7_iter6/VLAarchtests/artifacts/reports/selector_finetune_v7_iter7/VLAarchtests/artifacts/reports/selector_finetune_v7_iter8/VLAarchtests/artifacts/reports/task_routed_proxy_v1/VLAarchtests/artifacts/reports/rlbench_general_debug_20260330/VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/resume1000_summary/VLAarchtests/artifacts/reports/bag_mode_specialization_20260330/VLAarchtests/artifacts/reports/general_task_anchor_20260330_dual_push_buttons/VLAarchtests/artifacts/reports/rvt_overlap_branch_20260330/VLAarchtests/artifacts/reports/rvt_overlap_branch_fixedbounds_20260330/VLAarchtests/artifacts/reports/dual_push_nonzero_branch_20260330/VLAarchtests/artifacts/reports/dual_push_full_arch_hybrid_20260331/
HF Packaging Notes
Raw packaging changes applied to the staged HF export:
baselines/AnyBimanual_overlap_replay/multi/was reshaped from one flat directory into shard subdirectories:00000-04999/05000-09999/10000-14999/
- file count after reshape:
14034 - reconstruction helper added at:
environment/reconstruct_anybimanual_overlap_replay.sh
- exact rejected Hub error before reshape:
Your push was rejected because it contains too many files per directory. Each directory in your git repo can only contain up to 10000 files. Offending directories: /baselines/AnyBimanual_overlap_replay/multi/
Current Session Logs
Main logs staged in this repo:
reports/anybimanual_subset3_overlap_smoke200_fixpretrain_nowandb3_train.logreports/anybimanual_subset3_overlap_smoke200_fixpretrain_nowandb3_train_presavefix.logreports/anybimanual_subset3_overlap_resume1000_eval.logreports/anybimanual_subset3_overlap_resume1000_summary.logreports/task_routed_proxy_v1_rerun.logreports/run_bag_selector_iter9_prebuild.logreports/anybimanual_release_subset3_eval_ep5.logreports/rvt_overlap_branch_fixedbounds_20260330_chain.shreports/dual_push_full_arch_hybrid_iter6_scene_ep5.logreports/dual_push_full_arch_hybrid_iter6_backbone_ep2_r005.log
Official Overlap Eval Final Raw Outputs
Sources:
reports/anybimanual_subset3_overlap_resume1000_eval.logVLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/resume1000_summary/summary.json
Raw values:
- step
1000 - local mean success
0.16 coordinated_push_box: success0.0, return0.0coordinated_lift_ball: success0.0, return0.0dual_push_buttons: success0.48, return12.0
General-Task Anchor Raw Outputs
Sources:
VLAarchtests/artifacts/reports/general_task_anchor_20260330_dual_push_buttons/summary.json
Raw values:
- public AnyBimanual release, step
60000: success0.96, return24.0, length21.56 - local official single-task eval, step
60000,25episodes: success0.96, return24.0, length21.84 - local clip backbone-only result: success
0.0, return0.0 - local elastic reveal proxy iter6 result: success
0.0, return0.0 - local RVT frozen fixed-bounds result: success
0.0, return0.0
Dual-Push Branch Raw Outputs
Sources:
VLAarchtests/artifacts/reports/dual_push_nonzero_branch_20260330/summary.mdVLAarchtests/artifacts/reports/dual_push_full_arch_hybrid_20260331/summary.md
Raw values:
- demo replay through
absolute_action_from_delta: mean success0.8, mean return0.8 - retargeted demo with checkpoint backbone retrieval and vision-only button localization,
5episodes: mean success1.0, mean return1.0 - elastic checkpoint retargeted-demo probe with scene retrieval and vision-only button localization,
1episode: mean success1.0, mean return1.0 - full-architecture hybrid eval with elastic controller checkpoint plus dual-push retrieval checkpoint,
1episode: mean success1.0, mean return1.0, steps116, path recoveries0, noop fallbacks0