Public Bridge Smoke Run Log
Date: 2026-04-01 UTC
Completed public proxy evidence
Occlusion proxy already completed earlier on
PickClutterYCB-v1.Best current occlusion report:
/workspace/workspace/reports/maniskill_pickclutter_smoke_v5_eval_tuned_softerpref/public_benchmark_package_summary.jsontrunk_only_ft=0.04adapter_noop=0.04adapter_active_ft=0.62delta_active_vs_trunk=+0.5895% CI [0.44, 0.72]intervention_rate=1.0non_base_selection_rate=1.0
Bag proxy completed on the public ManiSkill bridge basket scene proxy.
Bag report directory:
/workspace/workspace/reports/maniskill_bag_bridge_smoke_v1
Bag result summary:
trunk_only_ft=0.32adapter_noop=0.00adapter_active_ft=0.48delta_active_vs_trunk=+0.16delta_active_vs_trunk_ci95=[-0.04, 0.34]intervention_rate=1.0non_base_selection_rate=1.0- bag track
signs_of_life=true - package-level headline gate remains false at this single-seed smoke scale
Cloth proxy definition
- Public scene proxy:
PutSpoonOnTableClothInScene-v1
- Fixed hidden-state initialization:
- spoon pose
[-0.235, -0.094, 0.8748] - cloth pose
[-0.235, -0.075, 0.885]
- spoon pose
- Deterministic valid-seed filter:
- accept only seeds whose initialized hidden state is below the visibility gate and solvable by scripted reveal+retrieve
- Reveal macros corrected to push-style actions:
lift_edge= front push in+yseparate_layer= side push in+x
- Cloth success metric corrected:
- based on spoon displacement from its own hidden start plus visibility
- no longer credits success merely because the cloth flies away
Important runner fixes already landed
- File:
/workspace/workspace/VLAarchtests3_export/code/VLAarchtests2_code/VLAarchtests/code/reveal_vla_bimanual/eval/run_maniskill_bridge_retrieval_smoke.py
- Fixed:
- cloth hidden-state initialization
- cloth seed filtering and split reuse via
episode_splits.json post_bundlemissing in cloth collect success check- bridge smoke loss weights aligned to current
LossWeights - adapter trainable parameter prefixes aligned to working pickclutter runner
- zero-depth layout changed to channel-first
- cached dataset normalizer added for old channel-last depth tensors
Live status when this note was written
- Bag process is complete.
- Cloth process is still collecting the train split in the original long-running session.
- The long-running cloth process was started before the later loss-weight and depth-layout fixes, so it is expected to finish collection and then crash at training start.
- After it writes
train.ptandval.pt, restart cloth with:
python /workspace/workspace/VLAarchtests3_export/code/VLAarchtests2_code/VLAarchtests/code/reveal_vla_bimanual/eval/run_maniskill_bridge_retrieval_smoke.py --task cloth --skip-collection
- If trunk checkpoint already exists by that point and only adapter needs rerun:
python /workspace/workspace/VLAarchtests3_export/code/VLAarchtests2_code/VLAarchtests/code/reveal_vla_bimanual/eval/run_maniskill_bridge_retrieval_smoke.py --task cloth --skip-collection --reuse-checkpoints
Cloth restart correction
- The corrected cloth restart reached adapter training and failed in rollout supervision because the cached cloth public proxy authored
7candidate targets while the decoder always allocates8proposal slots. - Fix landed in:
/workspace/workspace/VLAarchtests3_export/code/VLAarchtests2_code/VLAarchtests/code/reveal_vla_bimanual/eval/run_maniskill_bridge_retrieval_smoke.py
- Correction:
- cached bridge samples now normalize channel-last depth tensors as before
- cached candidate-aligned tensors now also pad from
7 -> 8slots before loading - padding cycles the non-base candidates first, which preserves the collected cloth episodes and avoids recollection
- Verified locally before restart:
- normalized cloth
candidate_action_chunksis(8, 8, 14) - normalized cloth
candidate_rollout_support_modeis(8, 5) - one real
adapter_active_fttraining step and one real validation loss pass both completed without the previous shape error
- normalized cloth
Cloth result
- Report directory:
/workspace/workspace/reports/maniskill_cloth_bridge_smoke_v1
- Final cloth smoke summary:
trunk_only_ft = 0.04adapter_noop = 0.04adapter_active_ft = 0.10delta_active_vs_trunk = +0.06delta_active_vs_trunk_ci95 = [-0.04, 0.16]intervention_rate = 0.3369non_base_selection_rate = 0.2674
- Interpretation:
- cloth proxy is positive and adapter-specific in this single-seed smoke because
adapter_noopstayed flat whileadapter_active_ftimproved - effect size is modest and not yet statistically clean in this smoke protocol
- cloth proxy is positive and adapter-specific in this single-seed smoke because
Combined three-track proxy suite
- Combined report:
/workspace/workspace/reports/public_proxy_suite_smoke_v1/combined_summary.json/workspace/workspace/reports/public_proxy_suite_smoke_v1/combined_summary.md
- Current three-track smoke evidence:
- occlusion proxy positive and adapter-specific
- bag proxy positive and adapter-specific
- cloth proxy positive and adapter-specific