# Elastic Occlusion Iteration Report Date: 2026-03-31 UTC ## Scope This iteration focused on the `trunk + adapter` path in: - `/workspace/workspace/VLAarchtests2_code/VLAarchtests/code/reveal_vla_bimanual` The target was to verify whether the adapter could show a light novelty signal on the proxy benchmark without breaking the no-op-safe trunk path. ## What Was Fixed ### 1. Proposal-target alignment bug The original fast adapter runs were training against teacher shortlist labels, not the adapter's own proposal set. Observed failure: - `candidate_utility` in the fast proxy dataset always had oracle argmax at slot `0` - adapter training therefore learned to prefer `base_action` Fixes: - `train/run_experiment.py` - now rebuilds adapter datasets when proposal-aligned targets are missing - `train/build_aligned_proposal_dataset.py` - now supports adapter-wrapped models - `tests/test_adapter_dataset_alignment.py` - added regression tests for missing aligned targets Result: - rebuilt aligned train dataset no longer collapses to slot `0` - aligned oracle winners are non-base proposals across tasks ### 2. Proposal-rollout alignment for transition training The lightweight transition path originally had no aligned rollout supervision for the adapter's own proposal candidates. Fixes: - `train/build_aligned_proposal_dataset.py` - now saves `proposal_target_rollout_*` tensors - `sim_reveal/dataset.py` - now loads proposal rollout targets - `train/losses.py` - transition loss now prefers proposal-aligned rollout targets when present - `tests/test_transition_alignment_targets.py` - verifies proposal rollout targets are selected over teacher candidate rollouts ### 3. Lightweight transition model bugs While enabling rollout training, multiple contract bugs surfaced and were fixed: - bad `clearance_field` broadcast in `models/world_model.py` - bad hidden-state expansion across proposal candidates in `models/world_model.py` - unsafe `.view()` on non-contiguous `proposal_mode_ids` - rollout loss did not resize corridor / spatial rollout targets to lightweight field resolution Tests added: - `tests/test_lightweight_transition_contract.py` - `tests/test_transition_rollout_loss_resizing.py` ## Guardrail Test Status Latest regression slice: - `14 passed, 1 warning` This included: - no-op equivalence - adapter gate behavior - task-specific loss masking - cloth metric selection - eval protocol identity - checkpoint remap - dataset alignment - transition alignment - lightweight transition contract - rollout target resizing ## Proxy Benchmark Results Benchmark setup: - benchmark mode: `sprint` - episodes per proxy: `8` - total episodes: `24` - proxies: `foliage_proxy`, `bag_proxy`, `cloth_proxy` ### Rank-only adapter on aligned proposal targets - active: - mean success: `0.0` - visibility integral: `0.15931496916649243` - corridor availability: `0.0015432098880410194` - disturbance cost: `0.6779018906719011` - premature retrieve rate: `0.8270833333333334` - planner regret: `0.0006857388885691762` - noop: - mean success: `0.0` - visibility integral: `0.159542116879796` - corridor availability: `0.0015432098880410194` - disturbance cost: `0.6762562873351642` - premature retrieve rate: `0.8354166666666667` - planner regret: `0.046383516304194926` Behavior: - non-base proposal usage: about `44.6%` of steps - families selected: `lift_edge`, `pin_left_rim`, `sweep_left` Conclusion: - selection collapse was fixed - planner regret improved sharply - reveal metrics did not improve ### Base-fast adapter on aligned proposal targets - active: - mean success: `0.0` - visibility integral: `0.15862687141634524` - corridor availability: `0.0015432098880410194` - disturbance cost: `0.6857880518323441` - premature retrieve rate: `0.7984375` - planner regret: `0.0015697095737171672` - noop: - mean success: `0.0` - visibility integral: `0.159542116879796` - corridor availability: `0.0015432098880410194` - disturbance cost: `0.6762562873351642` - premature retrieve rate: `0.8354166666666667` - planner regret: `0.046383516304194926` Behavior: - non-base proposal usage: `100%` of steps - per-task collapse: - foliage -> `sweep_left` - bag -> `pin_left_rim` - cloth -> `lift_edge` Conclusion: - proposal set changed aggressively - premature retrieve improved - visibility did not improve - disturbance worsened ### Transition-fast adapter on aligned proposal + rollout targets - active: - mean success: `0.0` - visibility integral: `0.15848870722887418` - corridor availability: `0.0015432098880410194` - disturbance cost: `0.6893061758801274` - premature retrieve rate: `0.8203125` - planner regret: `0.0012374107202049345` - noop: - mean success: `0.0` - visibility integral: `0.159542116879796` - corridor availability: `0.0015432098880410194` - disturbance cost: `0.6762562873351642` - premature retrieve rate: `0.8354166666666667` - planner regret: `0.046383516304194926` Behavior: - non-base proposal usage: about `33.3%` of steps - dominant non-base family: `lift_edge` Conclusion: - rollout alignment and transition training now work end-to-end - they still do not produce a reveal-quality gain on this proxy slice ## Main Conclusion The current adapter stack is now much better instrumented and several silent training/evaluation bugs were removed. That work was necessary. However, after fixing: - proposal-target alignment, - proposal-rollout alignment, - transition-model contract bugs, - rollout-loss resizing bugs, the proxy benchmark still does **not** clear the intended criterion: - no measurable success gain - no visibility or corridor gain over noop - only modest reduction in premature retrieve rate - planner regret improves, but execution quality does not So the current answer is: - the no-op-safe adapter path is now valid software - the current light adapter variants still do **not** show a convincing novelty win on the proxy benchmark - the likely next research move is not another small tuning pass, but a change in what is being optimized or proposed ## RLBench Status I did **not** claim live RLBench parity from this machine. Current blockers on this machine: - RLBench / PyRep / Coppelia environment is not installed - the local subset3 demo roots are not present - earlier repo notes already showed most old RLBench tasks were faulty on the prior setup except `dual_push_buttons` So the general-task no-regression story remains: - code-level no-op parity tests are passing - historical `dual_push_buttons` anchor evidence exists in repo artifacts - a fresh live pushbuttons rerun was not possible in this environment ## Recommended Next Move If continuing from here, the next useful step is: 1. keep the current bug fixes 2. stop spending time on more short proxy tuning of this exact stack 3. either: - redesign proposal generation so oracle-good reveal candidates are easier to separate early, or - shift to a stronger trunk / task-routed adapter variant and re-run the same aligned proxy protocol The current iteration establishes a clean negative result on the present fast adapter variants, which is still valuable.