Public Bridge Smoke Run Log

Date: 2026-04-01 UTC

Completed public proxy evidence

Occlusion proxy already completed earlier on PickClutterYCB-v1.
Best current occlusion report:
- /workspace/workspace/reports/maniskill_pickclutter_smoke_v5_eval_tuned_softerpref/public_benchmark_package_summary.json
- trunk_only_ft=0.04
- adapter_noop=0.04
- adapter_active_ft=0.62
- delta_active_vs_trunk=+0.58
- 95% CI [0.44, 0.72]
- intervention_rate=1.0
- non_base_selection_rate=1.0
Bag proxy completed on the public ManiSkill bridge basket scene proxy.
Bag report directory:
- /workspace/workspace/reports/maniskill_bag_bridge_smoke_v1
Bag result summary:
- trunk_only_ft=0.32
- adapter_noop=0.00
- adapter_active_ft=0.48
- delta_active_vs_trunk=+0.16
- delta_active_vs_trunk_ci95=[-0.04, 0.34]
- intervention_rate=1.0
- non_base_selection_rate=1.0
- bag track signs_of_life=true
- package-level headline gate remains false at this single-seed smoke scale

Cloth proxy definition

Public scene proxy:
- PutSpoonOnTableClothInScene-v1
Fixed hidden-state initialization:
- spoon pose [-0.235, -0.094, 0.8748]
- cloth pose [-0.235, -0.075, 0.885]
Deterministic valid-seed filter:
- accept only seeds whose initialized hidden state is below the visibility gate and solvable by scripted reveal+retrieve
Reveal macros corrected to push-style actions:
- lift_edge = front push in +y
- separate_layer = side push in +x
Cloth success metric corrected:
- based on spoon displacement from its own hidden start plus visibility
- no longer credits success merely because the cloth flies away

Important runner fixes already landed

File:
- /workspace/workspace/VLAarchtests3_export/code/VLAarchtests2_code/VLAarchtests/code/reveal_vla_bimanual/eval/run_maniskill_bridge_retrieval_smoke.py
Fixed:
- cloth hidden-state initialization
- cloth seed filtering and split reuse via episode_splits.json
- post_bundle missing in cloth collect success check
- bridge smoke loss weights aligned to current LossWeights
- adapter trainable parameter prefixes aligned to working pickclutter runner
- zero-depth layout changed to channel-first
- cached dataset normalizer added for old channel-last depth tensors

Live status when this note was written

Bag process is complete.
Cloth process is still collecting the train split in the original long-running session.
The long-running cloth process was started before the later loss-weight and depth-layout fixes, so it is expected to finish collection and then crash at training start.
After it writes train.pt and val.pt, restart cloth with:

python /workspace/workspace/VLAarchtests3_export/code/VLAarchtests2_code/VLAarchtests/code/reveal_vla_bimanual/eval/run_maniskill_bridge_retrieval_smoke.py --task cloth --skip-collection

If trunk checkpoint already exists by that point and only adapter needs rerun:

python /workspace/workspace/VLAarchtests3_export/code/VLAarchtests2_code/VLAarchtests/code/reveal_vla_bimanual/eval/run_maniskill_bridge_retrieval_smoke.py --task cloth --skip-collection --reuse-checkpoints

Cloth restart correction

The corrected cloth restart reached adapter training and failed in rollout supervision because the cached cloth public proxy authored 7 candidate targets while the decoder always allocates 8 proposal slots.
Fix landed in:
- /workspace/workspace/VLAarchtests3_export/code/VLAarchtests2_code/VLAarchtests/code/reveal_vla_bimanual/eval/run_maniskill_bridge_retrieval_smoke.py
Correction:
- cached bridge samples now normalize channel-last depth tensors as before
- cached candidate-aligned tensors now also pad from 7 -> 8 slots before loading
- padding cycles the non-base candidates first, which preserves the collected cloth episodes and avoids recollection
Verified locally before restart:
- normalized cloth candidate_action_chunks is (8, 8, 14)
- normalized cloth candidate_rollout_support_mode is (8, 5)
- one real adapter_active_ft training step and one real validation loss pass both completed without the previous shape error

Cloth result

Report directory:
- /workspace/workspace/reports/maniskill_cloth_bridge_smoke_v1
Final cloth smoke summary:
- trunk_only_ft = 0.04
- adapter_noop = 0.04
- adapter_active_ft = 0.10
- delta_active_vs_trunk = +0.06
- delta_active_vs_trunk_ci95 = [-0.04, 0.16]
- intervention_rate = 0.3369
- non_base_selection_rate = 0.2674
Interpretation:
- cloth proxy is positive and adapter-specific in this single-seed smoke because adapter_noop stayed flat while adapter_active_ft improved
- effect size is modest and not yet statistically clean in this smoke protocol

Combined three-track proxy suite

Combined report:
- /workspace/workspace/reports/public_proxy_suite_smoke_v1/combined_summary.json
- /workspace/workspace/reports/public_proxy_suite_smoke_v1/combined_summary.md
Current three-track smoke evidence:
- occlusion proxy positive and adapter-specific
- bag proxy positive and adapter-specific
- cloth proxy positive and adapter-specific