VLAarchtests4 / docs /public_bridge_smoke_run_log_2026-04-01.md
lsnu's picture
Add files using upload-large-folder tool
c725033 verified

Public Bridge Smoke Run Log

Date: 2026-04-01 UTC

Completed public proxy evidence

  • Occlusion proxy already completed earlier on PickClutterYCB-v1.

  • Best current occlusion report:

    • /workspace/workspace/reports/maniskill_pickclutter_smoke_v5_eval_tuned_softerpref/public_benchmark_package_summary.json
    • trunk_only_ft=0.04
    • adapter_noop=0.04
    • adapter_active_ft=0.62
    • delta_active_vs_trunk=+0.58
    • 95% CI [0.44, 0.72]
    • intervention_rate=1.0
    • non_base_selection_rate=1.0
  • Bag proxy completed on the public ManiSkill bridge basket scene proxy.

  • Bag report directory:

    • /workspace/workspace/reports/maniskill_bag_bridge_smoke_v1
  • Bag result summary:

    • trunk_only_ft=0.32
    • adapter_noop=0.00
    • adapter_active_ft=0.48
    • delta_active_vs_trunk=+0.16
    • delta_active_vs_trunk_ci95=[-0.04, 0.34]
    • intervention_rate=1.0
    • non_base_selection_rate=1.0
    • bag track signs_of_life=true
    • package-level headline gate remains false at this single-seed smoke scale

Cloth proxy definition

  • Public scene proxy:
    • PutSpoonOnTableClothInScene-v1
  • Fixed hidden-state initialization:
    • spoon pose [-0.235, -0.094, 0.8748]
    • cloth pose [-0.235, -0.075, 0.885]
  • Deterministic valid-seed filter:
    • accept only seeds whose initialized hidden state is below the visibility gate and solvable by scripted reveal+retrieve
  • Reveal macros corrected to push-style actions:
    • lift_edge = front push in +y
    • separate_layer = side push in +x
  • Cloth success metric corrected:
    • based on spoon displacement from its own hidden start plus visibility
    • no longer credits success merely because the cloth flies away

Important runner fixes already landed

  • File:
    • /workspace/workspace/VLAarchtests3_export/code/VLAarchtests2_code/VLAarchtests/code/reveal_vla_bimanual/eval/run_maniskill_bridge_retrieval_smoke.py
  • Fixed:
    • cloth hidden-state initialization
    • cloth seed filtering and split reuse via episode_splits.json
    • post_bundle missing in cloth collect success check
    • bridge smoke loss weights aligned to current LossWeights
    • adapter trainable parameter prefixes aligned to working pickclutter runner
    • zero-depth layout changed to channel-first
    • cached dataset normalizer added for old channel-last depth tensors

Live status when this note was written

  • Bag process is complete.
  • Cloth process is still collecting the train split in the original long-running session.
  • The long-running cloth process was started before the later loss-weight and depth-layout fixes, so it is expected to finish collection and then crash at training start.
  • After it writes train.pt and val.pt, restart cloth with:
python /workspace/workspace/VLAarchtests3_export/code/VLAarchtests2_code/VLAarchtests/code/reveal_vla_bimanual/eval/run_maniskill_bridge_retrieval_smoke.py --task cloth --skip-collection
  • If trunk checkpoint already exists by that point and only adapter needs rerun:
python /workspace/workspace/VLAarchtests3_export/code/VLAarchtests2_code/VLAarchtests/code/reveal_vla_bimanual/eval/run_maniskill_bridge_retrieval_smoke.py --task cloth --skip-collection --reuse-checkpoints

Cloth restart correction

  • The corrected cloth restart reached adapter training and failed in rollout supervision because the cached cloth public proxy authored 7 candidate targets while the decoder always allocates 8 proposal slots.
  • Fix landed in:
    • /workspace/workspace/VLAarchtests3_export/code/VLAarchtests2_code/VLAarchtests/code/reveal_vla_bimanual/eval/run_maniskill_bridge_retrieval_smoke.py
  • Correction:
    • cached bridge samples now normalize channel-last depth tensors as before
    • cached candidate-aligned tensors now also pad from 7 -> 8 slots before loading
    • padding cycles the non-base candidates first, which preserves the collected cloth episodes and avoids recollection
  • Verified locally before restart:
    • normalized cloth candidate_action_chunks is (8, 8, 14)
    • normalized cloth candidate_rollout_support_mode is (8, 5)
    • one real adapter_active_ft training step and one real validation loss pass both completed without the previous shape error

Cloth result

  • Report directory:
    • /workspace/workspace/reports/maniskill_cloth_bridge_smoke_v1
  • Final cloth smoke summary:
    • trunk_only_ft = 0.04
    • adapter_noop = 0.04
    • adapter_active_ft = 0.10
    • delta_active_vs_trunk = +0.06
    • delta_active_vs_trunk_ci95 = [-0.04, 0.16]
    • intervention_rate = 0.3369
    • non_base_selection_rate = 0.2674
  • Interpretation:
    • cloth proxy is positive and adapter-specific in this single-seed smoke because adapter_noop stayed flat while adapter_active_ft improved
    • effect size is modest and not yet statistically clean in this smoke protocol

Combined three-track proxy suite

  • Combined report:
    • /workspace/workspace/reports/public_proxy_suite_smoke_v1/combined_summary.json
    • /workspace/workspace/reports/public_proxy_suite_smoke_v1/combined_summary.md
  • Current three-track smoke evidence:
    • occlusion proxy positive and adapter-specific
    • bag proxy positive and adapter-specific
    • cloth proxy positive and adapter-specific