# Elastic Occlusion Iteration Report

Date: 2026-03-31 UTC

## Scope

This iteration focused on the `trunk + adapter` path in:

- `/workspace/workspace/VLAarchtests2_code/VLAarchtests/code/reveal_vla_bimanual`

The target was to verify whether the adapter could show a light novelty signal on the proxy benchmark without breaking the no-op-safe trunk path.

## What Was Fixed

### 1. Proposal-target alignment bug

The original fast adapter runs were training against teacher shortlist labels, not the adapter's own proposal set.

Observed failure:

- `candidate_utility` in the fast proxy dataset always had oracle argmax at slot `0`
- adapter training therefore learned to prefer `base_action`

Fixes:

- `train/run_experiment.py`
  - now rebuilds adapter datasets when proposal-aligned targets are missing
- `train/build_aligned_proposal_dataset.py`
  - now supports adapter-wrapped models
- `tests/test_adapter_dataset_alignment.py`
  - added regression tests for missing aligned targets

Result:

- rebuilt aligned train dataset no longer collapses to slot `0`
- aligned oracle winners are non-base proposals across tasks

### 2. Proposal-rollout alignment for transition training

The lightweight transition path originally had no aligned rollout supervision for the adapter's own proposal candidates.

Fixes:

- `train/build_aligned_proposal_dataset.py`
  - now saves `proposal_target_rollout_*` tensors
- `sim_reveal/dataset.py`
  - now loads proposal rollout targets
- `train/losses.py`
  - transition loss now prefers proposal-aligned rollout targets when present
- `tests/test_transition_alignment_targets.py`
  - verifies proposal rollout targets are selected over teacher candidate rollouts

### 3. Lightweight transition model bugs

While enabling rollout training, multiple contract bugs surfaced and were fixed:

- bad `clearance_field` broadcast in `models/world_model.py`
- bad hidden-state expansion across proposal candidates in `models/world_model.py`
- unsafe `.view()` on non-contiguous `proposal_mode_ids`
- rollout loss did not resize corridor / spatial rollout targets to lightweight field resolution

Tests added:

- `tests/test_lightweight_transition_contract.py`
- `tests/test_transition_rollout_loss_resizing.py`

## Guardrail Test Status

Latest regression slice:

- `14 passed, 1 warning`

This included:

- no-op equivalence
- adapter gate behavior
- task-specific loss masking
- cloth metric selection
- eval protocol identity
- checkpoint remap
- dataset alignment
- transition alignment
- lightweight transition contract
- rollout target resizing

## Proxy Benchmark Results

Benchmark setup:

- benchmark mode: `sprint`
- episodes per proxy: `8`
- total episodes: `24`
- proxies: `foliage_proxy`, `bag_proxy`, `cloth_proxy`

### Rank-only adapter on aligned proposal targets

- active:
  - mean success: `0.0`
  - visibility integral: `0.15931496916649243`
  - corridor availability: `0.0015432098880410194`
  - disturbance cost: `0.6779018906719011`
  - premature retrieve rate: `0.8270833333333334`
  - planner regret: `0.0006857388885691762`
- noop:
  - mean success: `0.0`
  - visibility integral: `0.159542116879796`
  - corridor availability: `0.0015432098880410194`
  - disturbance cost: `0.6762562873351642`
  - premature retrieve rate: `0.8354166666666667`
  - planner regret: `0.046383516304194926`

Behavior:

- non-base proposal usage: about `44.6%` of steps
- families selected: `lift_edge`, `pin_left_rim`, `sweep_left`

Conclusion:

- selection collapse was fixed
- planner regret improved sharply
- reveal metrics did not improve

### Base-fast adapter on aligned proposal targets

- active:
  - mean success: `0.0`
  - visibility integral: `0.15862687141634524`
  - corridor availability: `0.0015432098880410194`
  - disturbance cost: `0.6857880518323441`
  - premature retrieve rate: `0.7984375`
  - planner regret: `0.0015697095737171672`
- noop:
  - mean success: `0.0`
  - visibility integral: `0.159542116879796`
  - corridor availability: `0.0015432098880410194`
  - disturbance cost: `0.6762562873351642`
  - premature retrieve rate: `0.8354166666666667`
  - planner regret: `0.046383516304194926`

Behavior:

- non-base proposal usage: `100%` of steps
- per-task collapse:
  - foliage -> `sweep_left`
  - bag -> `pin_left_rim`
  - cloth -> `lift_edge`

Conclusion:

- proposal set changed aggressively
- premature retrieve improved
- visibility did not improve
- disturbance worsened

### Transition-fast adapter on aligned proposal + rollout targets

- active:
  - mean success: `0.0`
  - visibility integral: `0.15848870722887418`
  - corridor availability: `0.0015432098880410194`
  - disturbance cost: `0.6893061758801274`
  - premature retrieve rate: `0.8203125`
  - planner regret: `0.0012374107202049345`
- noop:
  - mean success: `0.0`
  - visibility integral: `0.159542116879796`
  - corridor availability: `0.0015432098880410194`
  - disturbance cost: `0.6762562873351642`
  - premature retrieve rate: `0.8354166666666667`
  - planner regret: `0.046383516304194926`

Behavior:

- non-base proposal usage: about `33.3%` of steps
- dominant non-base family: `lift_edge`

Conclusion:

- rollout alignment and transition training now work end-to-end
- they still do not produce a reveal-quality gain on this proxy slice

## Main Conclusion

The current adapter stack is now much better instrumented and several silent training/evaluation bugs were removed. That work was necessary.

However, after fixing:

- proposal-target alignment,
- proposal-rollout alignment,
- transition-model contract bugs,
- rollout-loss resizing bugs,

the proxy benchmark still does **not** clear the intended criterion:

- no measurable success gain
- no visibility or corridor gain over noop
- only modest reduction in premature retrieve rate
- planner regret improves, but execution quality does not

So the current answer is:

- the no-op-safe adapter path is now valid software
- the current light adapter variants still do **not** show a convincing novelty win on the proxy benchmark
- the likely next research move is not another small tuning pass, but a change in what is being optimized or proposed

## RLBench Status

I did **not** claim live RLBench parity from this machine.

Current blockers on this machine:

- RLBench / PyRep / Coppelia environment is not installed
- the local subset3 demo roots are not present
- earlier repo notes already showed most old RLBench tasks were faulty on the prior setup except `dual_push_buttons`

So the general-task no-regression story remains:

- code-level no-op parity tests are passing
- historical `dual_push_buttons` anchor evidence exists in repo artifacts
- a fresh live pushbuttons rerun was not possible in this environment

## Recommended Next Move

If continuing from here, the next useful step is:

1. keep the current bug fixes
2. stop spending time on more short proxy tuning of this exact stack
3. either:
   - redesign proposal generation so oracle-good reveal candidates are easier to separate early, or
   - shift to a stronger trunk / task-routed adapter variant and re-run the same aligned proxy protocol

The current iteration establishes a clean negative result on the present fast adapter variants, which is still valuable.