VLAarchtests3 / docs /CHANGE_AND_TEST_LOG.md
lsnu's picture
Add files using upload-large-folder tool
aa584de verified

Change And Test Log

This file records the main code changes and executed test commands copied into this repo. Result statements below are raw command outcomes only.

Previous Repo Work Included Here

Copied from history/VLAarchtests_previous_README.md:

  • core model, memory, planner, and dataset changes under:
    • VLAarchtests/code/reveal_vla_bimanual/models/
    • VLAarchtests/code/reveal_vla_bimanual/train/losses.py
    • VLAarchtests/code/reveal_vla_bimanual/sim_reveal/
    • VLAarchtests/code/reveal_vla_bimanual/sim_rlbench/dataset.py
  • training and eval paths under:
    • VLAarchtests/code/reveal_vla_bimanual/train/
    • VLAarchtests/code/reveal_vla_bimanual/eval/
  • earlier test suite under:
    • VLAarchtests/tests/

Current Session File Changes

Core reveal/proxy path

  • VLAarchtests/code/reveal_vla_bimanual/models/policy.py
  • VLAarchtests/code/reveal_vla_bimanual/models/action_decoder.py
  • VLAarchtests/code/reveal_vla_bimanual/models/backbones.py
  • VLAarchtests/code/reveal_vla_bimanual/models/rvt_backbone.py
  • VLAarchtests/code/reveal_vla_bimanual/train/losses.py
  • VLAarchtests/code/reveal_vla_bimanual/train/run_rlbench_experiment.py
  • VLAarchtests/code/reveal_vla_bimanual/eval/run_reveal_benchmark.py
  • VLAarchtests/code/reveal_vla_bimanual/eval/summarize_anybimanual_overlap_eval.py
  • VLAarchtests/code/reveal_vla_bimanual/eval/summarize_rvt_overlap_branch.py
  • VLAarchtests/code/reveal_vla_bimanual/eval/compose_task_routed_proxy_summary.py
  • VLAarchtests/code/reveal_vla_bimanual/eval/run_proposal_alignment_diagnostics.py
  • VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_knn_eval.py
  • VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_knn_task_sweep.py
  • VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_dual_push_retargeted_demo_eval.py
  • VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_dual_push_full_arch_hybrid_eval.py
  • VLAarchtests/code/reveal_vla_bimanual/eval/dual_push_retarget_utils.py
  • VLAarchtests/code/reveal_vla_bimanual/eval/dual_push_full_arch_utils.py
  • VLAarchtests/code/reveal_vla_bimanual/sim_reveal/build_task_specialized_episode_specs.py
  • VLAarchtests/code/reveal_vla_bimanual/sim_reveal/procedural_envs.py
  • VLAarchtests/code/reveal_vla_bimanual/sim_rlbench/task_resolver.py

Training/eval wrappers and configs

  • VLAarchtests/code/reveal_vla_bimanual/scripts/run_task_routed_proxy_eval.sh
  • VLAarchtests/code/reveal_vla_bimanual/scripts/run_bag_selector_iter9.sh
  • VLAarchtests/code/reveal_vla_bimanual/scripts/run_anybimanual_subset3_overlap_train.sh
  • VLAarchtests/code/reveal_vla_bimanual/scripts/run_anybimanual_subset3_overlap_eval.sh
  • VLAarchtests/code/reveal_vla_bimanual/scripts/run_rvt_overlap_branch.sh
  • VLAarchtests/code/reveal_vla_bimanual/scripts/run_dual_push_retargeted_demo_eval.sh
  • VLAarchtests/code/reveal_vla_bimanual/scripts/run_dual_push_full_arch_hybrid_eval.sh
  • VLAarchtests/code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_v7_selector_finetune_iter6.yaml
  • VLAarchtests/code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_v7_selector_finetune_iter7.yaml
  • VLAarchtests/code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_v7_selector_finetune_iter8.yaml
  • VLAarchtests/code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_v7_selector_finetune_iter9_bag.yaml
  • VLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_subset3_backbone_only_clip_100demo_fair_step1_full.yaml
  • VLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_subset3_backbone_only_rvt_100demo_frozen_seed17.yaml
  • VLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_subset3_backbone_only_rvt_100demo_unfreeze_top2_seed17.yaml
  • VLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_dual_push_backbone_only_clip_finetune_seed17.yaml
  • VLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_dual_push_backbone_only_clip_finetune_weighted_seed17.yaml
  • VLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_dual_push_backbone_only_clip_chunk8_weighted_seed17.yaml
  • environment/reconstruct_anybimanual_overlap_replay.sh

Test additions or updates

  • VLAarchtests/tests/test_eval_toggle_paths_work.py
  • VLAarchtests/tests/test_task_routed_model_eval.py
  • VLAarchtests/tests/test_anybimanual_resume_logic.py
  • VLAarchtests/tests/test_anybimanual_overlap_eval_summary.py
  • VLAarchtests/tests/test_candidate_ranking_loss.py
  • VLAarchtests/tests/test_compose_task_routed_proxy_summary.py
  • VLAarchtests/tests/test_build_task_specialized_episode_specs.py
  • VLAarchtests/tests/test_proposal_mode_names_label_base_action.py
  • VLAarchtests/tests/test_proxy_scripted_bench.py
  • VLAarchtests/tests/test_rvt_backbone_forward.py
  • VLAarchtests/tests/test_rlbench_dataset_rgbd_geometry.py
  • VLAarchtests/tests/test_rlbench_init_checkpoint.py
  • VLAarchtests/tests/test_rlbench_pickle_bootstrap.py
  • VLAarchtests/tests/test_rlbench_task_resolver_aliases.py
  • VLAarchtests/tests/test_summarize_rvt_overlap_branch.py
  • VLAarchtests/tests/test_dual_push_retarget_utils.py
  • VLAarchtests/tests/test_dual_push_full_arch_utils.py

Third-party baseline path changes

  • third_party/AnyBimanual/third_party/YARR/yarr/runners/offline_train_runner.py
  • third_party/AnyBimanual/third_party/YARR/yarr/runners/weight_init_utils.py
  • third_party/AnyBimanual/agents/peract_bc/launch_utils.py
  • third_party/AnyBimanual/agents/peract_bc/qattention_peract_bc_agent.py
  • third_party/AnyBimanual/agents/peract_bimanual/qattention_peract_bc_agent.py

Current Session Test Commands

Executed commands recorded in the workspace:

  • python -m py_compile /workspace/VLAarchtests/code/reveal_vla_bimanual/models/action_decoder.py /workspace/VLAarchtests/tests/test_proposal_mode_names_label_base_action.py
  • PYTHONPATH=/workspace/VLAarchtests/code/reveal_vla_bimanual pytest -q /workspace/VLAarchtests/tests/test_proposal_mode_names_label_base_action.py /workspace/VLAarchtests/tests/test_candidate_ranking_loss.py /workspace/VLAarchtests/tests/test_compose_task_routed_proxy_summary.py /workspace/VLAarchtests/tests/test_build_task_specialized_episode_specs.py
    • result: 11 passed
  • pytest -q /workspace/VLAarchtests/tests/test_anybimanual_overlap_eval_summary.py
    • result: 2 passed
  • pytest -q /workspace/VLAarchtests/tests/test_task_routed_model_eval.py /workspace/VLAarchtests/tests/test_eval_toggle_paths_work.py
    • result: 4 passed
  • pytest -q /workspace/VLAarchtests/tests/test_rvt_backbone_forward.py /workspace/VLAarchtests/tests/test_rlbench_dataset_rgbd_geometry.py /workspace/VLAarchtests/tests/test_eval_toggle_paths_work.py /workspace/VLAarchtests/tests/test_rlbench_init_checkpoint.py /workspace/VLAarchtests/tests/test_rlbench_pickle_bootstrap.py /workspace/VLAarchtests/tests/test_rlbench_task_resolver_aliases.py /workspace/VLAarchtests/tests/test_summarize_rvt_overlap_branch.py
    • result: passed
  • pytest -q /workspace/VLAarchtests/tests/test_build_task_specialized_episode_specs.py /workspace/VLAarchtests/tests/test_candidate_ranking_loss.py /workspace/VLAarchtests/tests/test_compose_task_routed_proxy_summary.py
    • result: 10 passed
  • pytest -q /workspace/VLAarchtests/tests/test_dual_push_retarget_utils.py /workspace/VLAarchtests/tests/test_rlbench_knn_eval_scene_kwargs.py
    • result: passed
  • pytest -q /workspace/VLAarchtests/tests/test_dual_push_retarget_utils.py
    • result: 6 passed
  • pytest -q /workspace/VLAarchtests/tests/test_dual_push_retarget_utils.py /workspace/VLAarchtests/tests/test_dual_push_full_arch_utils.py
    • result: 9 passed
  • bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_bag_selector_iter9.sh
  • bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_task_routed_proxy_eval.sh
  • bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_anybimanual_subset3_overlap_train.sh
  • bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_rvt_overlap_branch.sh
  • bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_dual_push_retargeted_demo_eval.sh
  • bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_dual_push_full_arch_hybrid_eval.sh
  • PYTHONPATH=/workspace/third_party/AnyBimanual/third_party/YARR pytest -q /workspace/VLAarchtests/tests/test_anybimanual_resume_logic.py
    • result: 4 passed
  • python -m py_compile /workspace/VLAarchtests/code/reveal_vla_bimanual/models/rvt_backbone.py /workspace/VLAarchtests/code/reveal_vla_bimanual/train/run_rlbench_experiment.py /workspace/VLAarchtests/code/reveal_vla_bimanual/sim_rlbench/dataset.py /workspace/VLAarchtests/code/reveal_vla_bimanual/sim_rlbench/task_resolver.py /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/summarize_rvt_overlap_branch.py
    • result: passed
  • python -m py_compile /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/dual_push_retarget_utils.py /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_dual_push_retargeted_demo_eval.py
    • result: passed
  • python -m py_compile /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/dual_push_retarget_utils.py /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/dual_push_full_arch_utils.py /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_dual_push_full_arch_hybrid_eval.py
    • result: passed

Current Session Generated Reports

Current-session report roots staged in this repo:

  • VLAarchtests/artifacts/reports/sprint_v7_summary/
  • VLAarchtests/artifacts/reports/sprint_v7_followup/
  • VLAarchtests/artifacts/reports/selector_finetune_v7_iterations/
  • VLAarchtests/artifacts/reports/selector_finetune_v7_iter6/
  • VLAarchtests/artifacts/reports/selector_finetune_v7_iter7/
  • VLAarchtests/artifacts/reports/selector_finetune_v7_iter8/
  • VLAarchtests/artifacts/reports/task_routed_proxy_v1/
  • VLAarchtests/artifacts/reports/rlbench_general_debug_20260330/
  • VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/
  • VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/resume1000_summary/
  • VLAarchtests/artifacts/reports/bag_mode_specialization_20260330/
  • VLAarchtests/artifacts/reports/general_task_anchor_20260330_dual_push_buttons/
  • VLAarchtests/artifacts/reports/rvt_overlap_branch_20260330/
  • VLAarchtests/artifacts/reports/rvt_overlap_branch_fixedbounds_20260330/
  • VLAarchtests/artifacts/reports/dual_push_nonzero_branch_20260330/
  • VLAarchtests/artifacts/reports/dual_push_full_arch_hybrid_20260331/

HF Packaging Notes

Raw packaging changes applied to the staged HF export:

  • baselines/AnyBimanual_overlap_replay/multi/ was reshaped from one flat directory into shard subdirectories:
    • 00000-04999/
    • 05000-09999/
    • 10000-14999/
  • file count after reshape: 14034
  • reconstruction helper added at:
    • environment/reconstruct_anybimanual_overlap_replay.sh
  • exact rejected Hub error before reshape:
    • Your push was rejected because it contains too many files per directory. Each directory in your git repo can only contain up to 10000 files. Offending directories: /baselines/AnyBimanual_overlap_replay/multi/

Current Session Logs

Main logs staged in this repo:

  • reports/anybimanual_subset3_overlap_smoke200_fixpretrain_nowandb3_train.log
  • reports/anybimanual_subset3_overlap_smoke200_fixpretrain_nowandb3_train_presavefix.log
  • reports/anybimanual_subset3_overlap_resume1000_eval.log
  • reports/anybimanual_subset3_overlap_resume1000_summary.log
  • reports/task_routed_proxy_v1_rerun.log
  • reports/run_bag_selector_iter9_prebuild.log
  • reports/anybimanual_release_subset3_eval_ep5.log
  • reports/rvt_overlap_branch_fixedbounds_20260330_chain.sh
  • reports/dual_push_full_arch_hybrid_iter6_scene_ep5.log
  • reports/dual_push_full_arch_hybrid_iter6_backbone_ep2_r005.log

Official Overlap Eval Final Raw Outputs

Sources:

  • reports/anybimanual_subset3_overlap_resume1000_eval.log
  • VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/resume1000_summary/summary.json

Raw values:

  • step 1000
  • local mean success 0.16
  • coordinated_push_box: success 0.0, return 0.0
  • coordinated_lift_ball: success 0.0, return 0.0
  • dual_push_buttons: success 0.48, return 12.0

General-Task Anchor Raw Outputs

Sources:

  • VLAarchtests/artifacts/reports/general_task_anchor_20260330_dual_push_buttons/summary.json

Raw values:

  • public AnyBimanual release, step 60000: success 0.96, return 24.0, length 21.56
  • local official single-task eval, step 60000, 25 episodes: success 0.96, return 24.0, length 21.84
  • local clip backbone-only result: success 0.0, return 0.0
  • local elastic reveal proxy iter6 result: success 0.0, return 0.0
  • local RVT frozen fixed-bounds result: success 0.0, return 0.0

Dual-Push Branch Raw Outputs

Sources:

  • VLAarchtests/artifacts/reports/dual_push_nonzero_branch_20260330/summary.md
  • VLAarchtests/artifacts/reports/dual_push_full_arch_hybrid_20260331/summary.md

Raw values:

  • demo replay through absolute_action_from_delta: mean success 0.8, mean return 0.8
  • retargeted demo with checkpoint backbone retrieval and vision-only button localization, 5 episodes: mean success 1.0, mean return 1.0
  • elastic checkpoint retargeted-demo probe with scene retrieval and vision-only button localization, 1 episode: mean success 1.0, mean return 1.0
  • full-architecture hybrid eval with elastic controller checkpoint plus dual-push retrieval checkpoint, 1 episode: mean success 1.0, mean return 1.0, steps 116, path recoveries 0, noop fallbacks 0