lsnu commited on 19 days ago

Commit

aa584de

verified ·

1 Parent(s): af6f91c

Add files using upload-large-folder tool

Browse files

Files changed (50) hide show

README.md +240 -0
artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_base_fast_seed17/config_resolved.yaml +173 -0
artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_base_fast_seed17/metrics.json +140 -0
artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_base_fast_seed17/summary.json +0 -0
artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_base_reuse128_seed17/config_resolved.yaml +174 -0
artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_base_reuse128_seed17/metrics.json +278 -0
artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_base_reuse128_seed17/summary.json +0 -0
artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_rank_only_fast_seed17/config_resolved.yaml +170 -0
artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_rank_only_fast_seed17/metrics.json +71 -0
artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_rank_only_fast_seed17/summary.json +0 -0
artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_rank_only_rebuild128_seed17/config_resolved.yaml +170 -0
artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_rank_only_rebuild128_seed17/metrics.json +278 -0
artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_rank_only_rebuild128_seed17/summary.json +0 -0
artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_transition_fast_seed17/config_resolved.yaml +174 -0
artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_transition_fast_seed17/metrics.json +140 -0
artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_transition_fast_seed17/summary.json +0 -0
artifacts/reports/anchor_dual_push_smoke_ep1/original_trunk/rollout_eval.json +280 -0
artifacts/reports/anchor_dual_push_smoke_ep1/original_trunk/rollout_eval.md +14 -0
artifacts/reports/anchor_dual_push_smoke_ep1/original_trunk/rollout_eval.partial.json +280 -0
artifacts/reports/peract2_anchor_smoke_live/bimanual_push_box/command.txt +1 -0
artifacts/reports/peract2_anchor_smoke_live/bimanual_push_box/stderr.txt +1 -0
artifacts/reports/peract2_anchor_smoke_live/bimanual_push_box/stdout.txt +38 -0
artifacts/reports/proxy_base_reuse128_smoke/scripted/reveal_benchmark.md +17 -0
artifacts/reports/proxy_semantic_heuristic_quick12/active/reveal_benchmark.json +0 -0
artifacts/reports/proxy_semantic_heuristic_quick12/active/reveal_benchmark.md +17 -0
artifacts/reports/proxy_semantic_heuristic_quick12/candidate0/reveal_benchmark.md +17 -0
artifacts/reports/proxy_semantic_heuristic_quick12/noop/reveal_benchmark.json +0 -0
artifacts/reports/proxy_semantic_heuristic_quick12/noop/reveal_benchmark.md +17 -0
artifacts/reports/proxy_semantic_heuristic_quick12/oracle/reveal_benchmark.json +0 -0
artifacts/reports/proxy_semantic_heuristic_quick12/oracle/reveal_benchmark.md +17 -0
artifacts/reports/proxy_semantic_nowm_quick12_final_noop/reveal_benchmark.md +17 -0
artifacts/reports/repaired_dual_push_chunk8_ep1_len25/rollout_eval.md +14 -0
artifacts/reports/repaired_dual_push_chunk8_ep1_len25/rollout_eval.partial.json +280 -0
artifacts/reports/repaired_dual_push_chunk8_ep3/rollout_eval.json +29 -0
artifacts/reports/repaired_dual_push_chunk8_ep3/rollout_eval.md +14 -0
artifacts/reports/repaired_dual_push_chunk8_ep3/rollout_eval.partial.json +29 -0
docs/CHANGE_AND_TEST_LOG.md +221 -0
docs/MODEL_AND_ARTIFACT_INDEX.md +59 -0
docs/RESULTS_RAW.md +178 -0
docs/VLAarchtests2_code_README.md +301 -0
docs/elastic_occlusion_handoff_completion_2026-03-31.md +184 -0
docs/elastic_occlusion_iteration_2026-03-31.md +232 -0
docs/elastic_occlusion_repo_audit_2026-03-31.md +400 -0
docs/instructions.md +1030 -0
legacy/general_task_anchor_20260330_dual_push_buttons/summary.json +37 -0
setup/ENVIRONMENT.md +55 -0
setup/bootstrap_same_hardware.sh +42 -0
setup/env_vars.sh +15 -0
setup/requirements_core.txt +22 -0
setup/rlbench_pip_freeze.txt +181 -0

README.md ADDED Viewed

	@@ -0,0 +1,240 @@

+# VLAarchtests3
+`VLAarchtests3` is the organized export of the elastic-occlusion bimanual VLA handoff completed on a 1x L40S RunPod machine.
+It is a successor snapshot to the earlier `VLAarchtests` and `VLAarchtests2` work:
+- `VLAarchtests`: earlier architecture-search and benchmark-debugging work.
+- `VLAarchtests2`: larger exploratory branch with frequent model changes, mixed benchmark artifacts, and several legacy results that needed manual reinterpretation.
+- `VLAarchtests3`: cleaned export focused on the final handoff state, the adapter refactor, the validated tests, the current checkpoints, and the reports needed to continue from here.
+## What Was Done
+The main engineering outcome was a refactor from a monolithic elastic policy into a cleaner `trunk + structured adapter + no-op fallback` stack.
+The final exported code contains:
+- a clean wrapped-policy interface with `trunk_only`, `adapter_noop`, and `adapter_active` modes,
+- a structured elastic-occlusion adapter with:
+  - reveal-state prediction,
+  - task-routed reveal/retrieve proposal families,
+  - retrieve-feasibility gating,
+  - a lightweight reveal-state transition model,
+- explicit tests that protect:
+  - no-op equivalence,
+  - generic-task fallback,
+  - benchmark protocol identity,
+  - unsafe retrieve blocking,
+  - cloth-specific selection behavior.
+The most important debugging pass was in the planner/gating logic. The original active path could reveal forever or retrieve too early. The final planner fixes made it:
+- summarize scene readiness at the scene level rather than worst-candidate level,
+- hard-mask unsafe retrieve candidates,
+- switch from reveal to retrieve once feasibility is met,
+- use task-specific bag and cloth readiness criteria,
+- prefer reveal macros early and retrieve later.
+## What Was Actually Evaluated
+Two different kinds of evidence are included.
+### 1. Trusted General-Task Anchor
+This was kept narrow on purpose because only `dual_push_buttons` was trusted on this setup.
+Trusted anchor evidence:
+- official AnyBimanual local anchor summary on `dual_push_buttons`:
+  - `25` episodes
+  - success `0.96`
+- live rerun on this RunPod:
+  - `5` episodes
+  - scores `[0, 100, 100, 0, 0]`
+  - mean score `40.0`
+Interpretation:
+- the official trunk path is real and non-trivial on the one stable anchor task,
+- this does **not** mean the local custom CLIP trunk was competitive broadly,
+- this does **not** validate the other unstable RLBench target-like tasks.
+### 2. Reveal/Retrieve Proxy Benchmark
+This benchmark is useful for mechanism debugging, but it is **not** a real robot/physics benchmark.
+The final reported held-out smoke benchmark used:
+- `12` foliage episodes,
+- `12` bag episodes,
+- `12` cloth episodes,
+- `36` total episodes,
+- separate held-out procedural seeds from the adapter train/val splits.
+Results:
+- non-intervention / matched no-op:
+  - mean success `0.000`
+  - foliage `0.000`
+  - bag `0.000`
+  - cloth `0.000`
+  - visibility integral `2.275`
+  - corridor availability `0.0312`
+  - disturbance cost `0.7433`
+- intervention / adapter active:
+  - mean success `0.6667`
+  - foliage `0.6667`
+  - bag `0.7500`
+  - cloth `0.5833`
+  - visibility integral `19.9503`
+  - corridor availability `0.7974`
+  - disturbance cost `0.2835`
+  - reocclusion rate `0.00278`
+  - planner regret `0.1586`
+The active policy did really intervene on these tasks. It did not just fall back silently to the trunk:
+- all recorded selections on the final held-out smoke run were non-base candidates,
+- typical successful pattern:
+  - foliage: reveal (`pin_canopy`) then `retrieve`,
+  - bag: reveal (`widen_mouth`) then `retrieve`,
+  - cloth: reveal (`separate_layer`) then `retrieve`.
+## Important Limitation
+The reveal/retrieve proxy is a procedural synthetic environment, not a contact-rich robot simulator.
+It has:
+- synthetic RGB-D renders,
+- internal latent state,
+- hand-coded transition rules,
+- scripted teacher/oracle supervision.
+It does **not** have:
+- rigid-body or deformable physics,
+- actual robot kinematics,
+- true contact/grasp simulation,
+- a fair end-to-end manipulation distribution for a pretrained trunk.
+Therefore:
+- the proxy result is useful to validate adapter logic,
+- the proxy result is **not** sufficient evidence that the trunk or the full system would outperform real baselines on RLBench or on the future custom benchmark.
+## What Was Learned
+The work supports the following conclusions:
+- the structured adapter idea is still alive,
+- the explicit reveal-state variables are worth keeping,
+- task-routed reveal macros matter,
+- retrieve-feasibility gating matters,
+- the no-op fallback path for general tasks is sound,
+- the old heavy memory/world-model story is not where the strongest evidence lives.
+The work does **not** yet justify:
+- a claim of broad general-task superiority,
+- a claim that the current proxy benchmark is a fair end-to-end benchmark,
+- a claim that the architecture is validated on realistic target-like sim tasks.
+## Was The Adapter Trained?
+Yes.
+The final proxy adapter checkpoint was trained with:
+- frozen trunk,
+- adapter-only updates,
+- trained components:
+  - reveal/state head,
+  - proposal prior,
+  - transition model,
+  - planner/reranker.
+Proxy training data:
+- train: `128` episodes per proxy family,
+- val: `32` episodes per proxy family,
+- proxy families:
+  - foliage,
+  - bag,
+  - cloth.
+The final headline smoke benchmark was not run on those train/val episodes. It used separate held-out seeds.
+## Was This A Perfect Fairness Story?
+No.
+What is fair in the current export:
+- matched active vs no-op comparisons on the same wrapped checkpoint,
+- held-out procedural seeds for the final proxy benchmark,
+- exact no-op and generic-task fallback tests.
+What is still missing for a stronger paper-quality comparison:
+1. same-initialization `trunk_only` fine-tuned on the same proxy data,
+2. same-initialization `trunk + adapter` fine-tuned on the same proxy data,
+3. comparison on held-out proxy seeds,
+4. comparison on stable real-sim tasks.
+## What Is Left To Do
+The main remaining work is on real sim benchmarks, not more abstract proxy optimization.
+Priority list:
+1. Train a fair control:
+   - same initialization,
+   - `trunk_only` fine-tuned on the same reveal/retrieve proxy data,
+   - compare against `trunk + adapter`.
+2. Attach the adapter directly to a strong public trunk:
+   - official AnyBimanual,
+   - official PerAct2 / RVT,
+   - or 3D FlowMatch Actor if practical.
+3. Validate on stable real-sim tasks:
+   - do not trust unstable RLBench tasks with infeasible waypoints,
+   - rebuild a trustworthy target-like evaluation subset,
+   - keep `dual_push_buttons` as a regression anchor only.
+4. Add a deformable / garment benchmark:
+   - this is the most relevant public step toward the future suitcase/clothes benchmark.
+5. Only after that:
+   - revisit larger RLBench sweeps,
+   - or collect custom teleop data.
+## Repository Layout
+- `code/`
+  - cleaned code snapshot used for the handoff
+- `artifacts/outputs/`
+  - current adapter checkpoints and training outputs
+- `artifacts/reports/`
+  - evaluation and debugging reports
+- `artifacts/data/reveal_proxy/`
+  - proxy train/val datasets used by this stage
+- `legacy/`
+  - exact older checkpoints and summaries that the current work depends on
+- `docs/`
+  - audit, iteration, and completion reports from this handoff
+- `setup/`
+  - same-machine environment notes and helper scripts
+## Recommended Use Of This Repo
+Use this repo as:
+- the archival handoff state,
+- the codebase to continue adapter work from,
+- the source of the current checkpoints and benchmark reports,
+- the baseline package before moving to real sim validation.
+Do **not** use it as evidence that the architecture is already validated on realistic manipulation benchmarks. That validation is what should happen next.

artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_base_fast_seed17/config_resolved.yaml ADDED Viewed

	@@ -0,0 +1,173 @@

+experiment_name: proxy_adapter_wrapped_clip_base_fast_seed17
+output_dir: /workspace/workspace/outputs/adapter_proxy
+device: cuda
+seed: 17
+init_checkpoint: /workspace/workspace/VLAarchtests2/VLAarchtests/artifacts/outputs/r3d_handoff_phase/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_seed17/checkpoint_best.pt
+init_strict: false
+data:
+  proxies:
+  - foliage_proxy
+  - bag_proxy
+  - cloth_proxy
+  resolution: 224
+  dataset_version: reveal_proxy_v6_rgbd_elastic_state_phase_fast
+  train_episodes_per_proxy: 12
+  val_episodes_per_proxy: 4
+  train_dataset_path: /workspace/workspace/data/reveal_proxy/proxy_train_clip224_v6_rgbd_stage3_phase_fast.pt
+  val_dataset_path: /workspace/workspace/data/reveal_proxy/proxy_val_clip224_v6_rgbd_stage3_phase_fast.pt
+  rebuild_dataset: false
+  chunk_horizon: 8
+  rollout_horizon: 5
+  history_steps: 6
+  planner_candidates: 8
+  seed: 17
+optim:
+  epochs: 2
+  batch_size: 4
+  num_workers: 8
+  lr: 0.0001
+  weight_decay: 0.0001
+trainer:
+  policy_type: adapter_wrapped
+  training_regime: adapter_train_frozen_trunk
+  eval_mode: adapter_active
+  adapter_mode: adapter_active
+  adapter_use_transition_model: false
+  adapter_use_task_conditioning: true
+  use_bf16: true
+  grad_clip_norm: 1.0
+  freeze_backbone: true
+  gradient_checkpointing: false
+  plan_during_train: false
+  plan_during_eval: false
+  support_mode_conditioning: true
+  planner_mode: false
+  use_depth: true
+  use_world_model: false
+  use_role_tokens: true
+  compute_equivariance_probe: false
+  trainable_parameter_prefixes:
+  - adapter.state_head
+  - adapter.proposal_prior
+  - adapter.planner
+policy:
+  backbone:
+    model_name: openai/clip-vit-base-patch32
+    hidden_dim: 512
+    max_text_tokens: 32
+    freeze_backbone: true
+    gradient_checkpointing: false
+    use_dummy_backbone: false
+  fusion:
+    hidden_dim: 512
+    num_cameras: 3
+    num_layers: 4
+    num_heads: 8
+    ff_dim: 2048
+    dropout: 0.1
+    proprio_dim: 32
+    proprio_tokens: 1
+  memory:
+    hidden_dim: 512
+    action_dim: 14
+    history_steps: 6
+    scene_history_steps: 3
+    belief_history_steps: 8
+    num_layers: 2
+    dropout: 0.1
+    memory_bank_size: 4
+    scene_bank_size: 2
+    belief_bank_size: 2
+    num_heads: 8
+    max_history_steps: 8
+    reveal_cache_steps: 4
+    reveal_cache_decay: 0.7
+  decoder:
+    hidden_dim: 512
+    num_heads: 8
+    num_layers: 4
+    ff_dim: 2048
+    dropout: 0.1
+    chunk_size: 8
+    action_dim: 14
+    arm_action_dim: 7
+    num_candidates: 8
+    num_phases: 5
+    num_arm_roles: 4
+    num_proposal_modes: 7
+    planner_top_k: 4
+    proposal_delta_scale: 0.2
+    proposal_slot_scale: 0.05
+  reveal_head:
+    hidden_dim: 512
+    num_support_modes: 3
+    num_approach_templates: 32
+    rollout_horizon: 5
+    belief_map_size: 32
+    field_size: 16
+    num_heads: 8
+    predict_belief_map: true
+    num_phases: 5
+    num_arm_roles: 4
+    num_interaction_tokens: 8
+    num_tasks: 4
+  world_model:
+    hidden_dim: 512
+    action_dim: 14
+    num_support_modes: 3
+    num_approach_templates: 32
+    rollout_horizon: 5
+    field_size: 16
+    num_heads: 8
+    num_phases: 5
+    num_arm_roles: 4
+    num_interaction_tokens: 8
+    belief_map_size: 32
+    predict_belief_map: true
+    scene_bank_size: 2
+    belief_bank_size: 2
+    rollout_mode: compact_rollout
+    num_tasks: 4
+    lightweight_field_size: 4
+  planner:
+    hidden_dim: 512
+    num_candidates: 8
+    action_dim: 14
+    num_support_modes: 3
+    utility_margin: 0.1
+    num_heads: 8
+    num_layers: 2
+    num_phases: 5
+    num_arm_roles: 4
+    top_k: 4
+    adapter_confidence_threshold: 0.45
+loss_weights:
+  action: 1.0
+  phase: 0.08
+  arm_role: 0.08
+  support_mode: 0.08
+  corridor: 0.12
+  persistence: 0.06
+  disturbance: 0.06
+  world_model: 0.0
+  transition: 0.0
+  belief: 0.05
+  visibility: 0.05
+  clearance: 0.06
+  support_stability: 0.06
+  reocclusion: 0.06
+  occluder_contact: 0.05
+  grasp_affordance: 0.05
+  planner_success: 0.15
+  planner_risk: 0.08
+  planner_ranking: 0.15
+  proposal_reconstruction: 0.08
+  proposal_success: 0.1
+  proposal_ranking: 0.12
+  proposal_mode: 0.08
+  proposal_diversity: 0.05
+  role_swap_consistency: 0.0
+  task_metrics: 0.06
+  gate: 0.05
+  distillation: 0.05
+  calibration: 0.02

artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_base_fast_seed17/metrics.json ADDED Viewed

	@@ -0,0 +1,140 @@

+[
+  {
+    "epoch": 0,
+    "train": {
+      "action": 1.1780137238295183,
+      "arm_role": 0.000544056080402895,
+      "belief": 0.10274084074341733,
+      "calibration": 0.0,
+      "clearance": 0.08112246429790622,
+      "corridor": 0.21243907782532598,
+      "distillation": 0.0036539296447501883,
+      "disturbance": 0.0010930091615908009,
+      "gate": 0.0,
+      "grasp_affordance": 0.011060374242294094,
+      "occluder_contact": 0.19354943348013837,
+      "persistence": 0.29602919886415097,
+      "phase": 0.1456924275211666,
+      "planner_ranking": 1.1046701566032742,
+      "planner_risk": 0.03252584584381269,
+      "planner_success": 0.5002943964108176,
+      "proposal_diversity": 0.0,
+      "proposal_mode": 0.9053098727827487,
+      "proposal_ranking": 0.7633599224297897,
+      "proposal_reconstruction": 1.1813416908616605,
+      "proposal_success": 0.5018493273983831,
+      "reocclusion": 0.1370238650428212,
+      "role_swap_consistency": 0.0,
+      "support_mode": 0.0010332910049170175,
+      "support_stability": 0.13264792088581168,
+      "task_metrics": 0.07693366929078879,
+      "total": 1.8312026676924333,
+      "transition": 0.0,
+      "uncertainty": 1.4312560102039045e-05,
+      "visibility": 0.096126823645571,
+      "world_model": 0.0
+    },
+    "val": {
+      "action": 1.146972581744194,
+      "arm_role": 2.7849786739864157e-05,
+      "belief": 0.09928969945758581,
+      "calibration": 0.0,
+      "clearance": 0.07546275667846203,
+      "corridor": 0.18693614657968283,
+      "distillation": 0.005982774979202077,
+      "disturbance": 0.0012652746545427362,
+      "gate": 0.0,
+      "grasp_affordance": 0.009092151012737304,
+      "occluder_contact": 0.19199086539447308,
+      "persistence": 0.4173499735770747,
+      "phase": 0.20510842488147318,
+      "planner_ranking": 1.0746948570013046,
+      "planner_risk": 0.03205434698611498,
+      "planner_success": 0.3765582703053951,
+      "proposal_diversity": 0.0,
+      "proposal_mode": 0.5553285405039787,
+      "proposal_ranking": 0.6613346468657255,
+      "proposal_reconstruction": 1.1140409670770168,
+      "proposal_success": 0.32496484369039536,
+      "reocclusion": 0.2021030569449067,
+      "role_swap_consistency": 0.0,
+      "support_mode": 0.00011286496555840131,
+      "support_stability": 0.13265474420040846,
+      "task_metrics": 0.06524855340830982,
+      "total": 1.7250810116529465,
+      "transition": 0.0,
+      "uncertainty": 8.913456255754681e-06,
+      "visibility": 0.09269411116838455,
+      "world_model": 0.0
+    }
+  },
+  {
+    "epoch": 1,
+    "train": {
+      "action": 1.1840074995289678,
+      "arm_role": 1.7842088946844857e-05,
+      "belief": 0.10108890773161598,
+      "calibration": 0.0,
+      "clearance": 0.08066983359015506,
+      "corridor": 0.20431885726587928,
+      "distillation": 0.005328163808292668,
+      "disturbance": 0.000988402207440231,
+      "gate": 0.0,
+      "grasp_affordance": 0.010460576832132496,
+      "occluder_contact": 0.19120351322319196,
+      "persistence": 0.20984708754669712,
+      "phase": 0.1270662468412648,
+      "planner_ranking": 1.051699793857077,
+      "planner_risk": 0.03183994928131933,
+      "planner_success": 0.37528212303700653,
+      "proposal_diversity": 0.0,
+      "proposal_mode": 0.541168266016504,
+      "proposal_ranking": 0.7413897125617318,
+      "proposal_reconstruction": 1.1529877976230953,
+      "proposal_success": 0.273181245378826,
+      "reocclusion": 0.11955958685797194,
+      "role_swap_consistency": 0.0,
+      "support_mode": 0.00014792317929475203,
+      "support_stability": 0.1314481108084969,
+      "task_metrics": 0.07543641668946846,
+      "total": 1.744326695151951,
+      "transition": 0.0,
+      "uncertainty": 7.94198708297739e-06,
+      "visibility": 0.09458825672450273,
+      "world_model": 0.0
+    },
+    "val": {
+      "action": 1.1787440478801727,
+      "arm_role": 1.3783465302452669e-05,
+      "belief": 0.0974554605782032,
+      "calibration": 0.0,
+      "clearance": 0.0746708307415247,
+      "corridor": 0.18591812625527382,
+      "distillation": 0.0038922334788367152,
+      "disturbance": 0.0005819438138132682,
+      "gate": 0.0,
+      "grasp_affordance": 0.008575586834922433,
+      "occluder_contact": 0.19005733728408813,
+      "persistence": 0.4048172008187976,
+      "phase": 0.24421580568014178,
+      "planner_ranking": 1.0271672308444977,
+      "planner_risk": 0.03108011605218053,
+      "planner_success": 0.3713325075805187,
+      "proposal_diversity": 0.0,
+      "proposal_mode": 0.46797188371419907,
+      "proposal_ranking": 0.6800601556897163,
+      "proposal_reconstruction": 1.0902876928448677,
+      "proposal_success": 0.25984624214470387,
+      "reocclusion": 0.19258547481149435,
+      "role_swap_consistency": 0.0,
+      "support_mode": 0.00014510085156871355,
+      "support_stability": 0.13228781055659056,
+      "task_metrics": 0.06339579145424068,
+      "total": 1.7367750853300095,
+      "transition": 0.0,
+      "uncertainty": 6.649694360483238e-06,
+      "visibility": 0.09114759508520365,
+      "world_model": 0.0
+    }
+  }
+]

artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_base_fast_seed17/summary.json ADDED Viewed

The diff for this file is too large to render. See raw diff

artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_base_reuse128_seed17/config_resolved.yaml ADDED Viewed

	@@ -0,0 +1,174 @@

+experiment_name: proxy_adapter_wrapped_clip_base_reuse128_seed17
+output_dir: /workspace/workspace/outputs/adapter_proxy
+device: cuda
+seed: 17
+init_checkpoint: /workspace/workspace/VLAarchtests2/VLAarchtests/artifacts/outputs/r3d_handoff_phase/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_seed17/checkpoint_best.pt
+init_strict: false
+data:
+  proxies:
+  - foliage_proxy
+  - bag_proxy
+  - cloth_proxy
+  resolution: 224
+  dataset_version: reveal_proxy_v6_rgbd_elastic_state_phase
+  train_episodes_per_proxy: 128
+  val_episodes_per_proxy: 32
+  train_dataset_path: /workspace/workspace/data/reveal_proxy/proxy_train_clip224_v6_rgbd_stage3_phase_rebuild128_seed17.pt
+  val_dataset_path: /workspace/workspace/data/reveal_proxy/proxy_val_clip224_v6_rgbd_stage3_phase_rebuild128_seed17.pt
+  rebuild_dataset: false
+  chunk_horizon: 8
+  rollout_horizon: 5
+  history_steps: 6
+  planner_candidates: 8
+  seed: 17
+optim:
+  epochs: 4
+  batch_size: 8
+  num_workers: 32
+  lr: 0.0001
+  weight_decay: 0.0001
+trainer:
+  policy_type: adapter_wrapped
+  training_regime: adapter_train_frozen_trunk
+  eval_mode: adapter_active
+  adapter_mode: adapter_active
+  adapter_use_transition_model: true
+  adapter_use_task_conditioning: true
+  use_bf16: true
+  grad_clip_norm: 1.0
+  freeze_backbone: true
+  gradient_checkpointing: false
+  plan_during_train: false
+  plan_during_eval: false
+  support_mode_conditioning: true
+  planner_mode: false
+  use_depth: true
+  use_world_model: false
+  use_role_tokens: true
+  compute_equivariance_probe: false
+  trainable_parameter_prefixes:
+  - adapter.state_head
+  - adapter.proposal_prior
+  - adapter.transition_model
+  - adapter.planner
+policy:
+  backbone:
+    model_name: openai/clip-vit-base-patch32
+    hidden_dim: 512
+    max_text_tokens: 32
+    freeze_backbone: true
+    gradient_checkpointing: false
+    use_dummy_backbone: false
+  fusion:
+    hidden_dim: 512
+    num_cameras: 3
+    num_layers: 4
+    num_heads: 8
+    ff_dim: 2048
+    dropout: 0.1
+    proprio_dim: 32
+    proprio_tokens: 1
+  memory:
+    hidden_dim: 512
+    action_dim: 14
+    history_steps: 6
+    scene_history_steps: 3
+    belief_history_steps: 8
+    num_layers: 2
+    dropout: 0.1
+    memory_bank_size: 4
+    scene_bank_size: 2
+    belief_bank_size: 2
+    num_heads: 8
+    max_history_steps: 8
+    reveal_cache_steps: 4
+    reveal_cache_decay: 0.7
+  decoder:
+    hidden_dim: 512
+    num_heads: 8
+    num_layers: 4
+    ff_dim: 2048
+    dropout: 0.1
+    chunk_size: 8
+    action_dim: 14
+    arm_action_dim: 7
+    num_candidates: 8
+    num_phases: 5
+    num_arm_roles: 4
+    num_proposal_modes: 7
+    planner_top_k: 4
+    proposal_delta_scale: 0.2
+    proposal_slot_scale: 0.05
+  reveal_head:
+    hidden_dim: 512
+    num_support_modes: 3
+    num_approach_templates: 32
+    rollout_horizon: 5
+    belief_map_size: 32
+    field_size: 16
+    num_heads: 8
+    predict_belief_map: true
+    num_phases: 5
+    num_arm_roles: 4
+    num_interaction_tokens: 8
+    num_tasks: 4
+  world_model:
+    hidden_dim: 512
+    action_dim: 14
+    num_support_modes: 3
+    num_approach_templates: 32
+    rollout_horizon: 5
+    field_size: 16
+    num_heads: 8
+    num_phases: 5
+    num_arm_roles: 4
+    num_interaction_tokens: 8
+    belief_map_size: 32
+    predict_belief_map: true
+    scene_bank_size: 2
+    belief_bank_size: 2
+    rollout_mode: compact_rollout
+    num_tasks: 4
+    lightweight_field_size: 4
+  planner:
+    hidden_dim: 512
+    num_candidates: 8
+    action_dim: 14
+    num_support_modes: 3
+    utility_margin: 0.1
+    num_heads: 8
+    num_layers: 2
+    num_phases: 5
+    num_arm_roles: 4
+    top_k: 4
+    adapter_confidence_threshold: 0.55
+loss_weights:
+  action: 1.0
+  phase: 0.08
+  arm_role: 0.08
+  support_mode: 0.08
+  corridor: 0.12
+  persistence: 0.06
+  disturbance: 0.06
+  world_model: 0.0
+  transition: 0.2
+  belief: 0.05
+  visibility: 0.05
+  clearance: 0.06
+  support_stability: 0.06
+  reocclusion: 0.06
+  occluder_contact: 0.05
+  grasp_affordance: 0.05
+  planner_success: 0.15
+  planner_risk: 0.08
+  planner_ranking: 0.15
+  proposal_reconstruction: 0.08
+  proposal_success: 0.1
+  proposal_ranking: 0.12
+  proposal_mode: 0.08
+  proposal_diversity: 0.05
+  role_swap_consistency: 0.0
+  task_metrics: 0.06
+  gate: 0.05
+  distillation: 0.05
+  calibration: 0.02

artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_base_reuse128_seed17/metrics.json ADDED Viewed

	@@ -0,0 +1,278 @@

+[
+  {
+    "epoch": 0,
+    "train": {
+      "action": 1.1828932802216345,
+      "arm_role": 0.00244398444339226,
+      "belief": 0.10072019552232839,
+      "calibration": 0.0,
+      "clearance": 0.07946077994063121,
+      "corridor": 0.21543118382702356,
+      "distillation": 0.00042247207064432005,
+      "disturbance": 0.0009066167868626844,
+      "gate": 0.0,
+      "grasp_affordance": 0.011442071496031615,
+      "occluder_contact": 0.19184747789086415,
+      "persistence": 0.5456274578801724,
+      "phase": 0.1889389944928033,
+      "planner_ranking": 0.8968874569199666,
+      "planner_risk": 0.03290799349358603,
+      "planner_success": 0.35506935793311656,
+      "proposal_diversity": 0.0,
+      "proposal_mode": 0.7599493966383093,
+      "proposal_ranking": 1.4915186276956767,
+      "proposal_reconstruction": 1.0803285907296574,
+      "proposal_success": 0.3194384900461726,
+      "reocclusion": 0.1872198152817598,
+      "role_swap_consistency": 0.0,
+      "support_mode": 0.4244060135689102,
+      "support_stability": 0.13155287654459977,
+      "task_metrics": 0.07493724777292804,
+      "total": 2.751452175509028,
+      "transition": 4.318220460114359,
+      "uncertainty": 1.531094441807496e-05,
+      "visibility": 0.09642757938689545,
+      "world_model": 0.0
+    },
+    "val": {
+      "action": 1.1680383563041687,
+      "arm_role": 0.0025612511759391054,
+      "belief": 0.09879593178629875,
+      "calibration": 0.0,
+      "clearance": 0.07741740134855112,
+      "corridor": 0.20817755659421286,
+      "distillation": 0.0,
+      "disturbance": 0.0007382428300237128,
+      "gate": 0.0,
+      "grasp_affordance": 0.010511041525751353,
+      "occluder_contact": 0.19018630186716715,
+      "persistence": 0.4509886346757412,
+      "phase": 0.1597365932694326,
+      "planner_ranking": 0.22907628491520882,
+      "planner_risk": 0.02909238338470459,
+      "planner_success": 0.18200772007306418,
+      "proposal_diversity": 0.0,
+      "proposal_mode": 0.71118057568868,
+      "proposal_ranking": 1.4729209462801616,
+      "proposal_reconstruction": 1.015290528535843,
+      "proposal_success": 0.2791739940643311,
+      "reocclusion": 0.16477556849519412,
+      "role_swap_consistency": 0.0,
+      "support_mode": 0.5340653051932652,
+      "support_stability": 0.12872510105371476,
+      "task_metrics": 0.06174707182993491,
+      "total": 2.407643111546834,
+      "transition": 3.39704422156016,
+      "uncertainty": 7.099100287177862e-06,
+      "visibility": 0.09383414511879286,
+      "world_model": 0.0
+    }
+  },
+  {
+    "epoch": 1,
+    "train": {
+      "action": 1.187044749740793,
+      "arm_role": 0.001233981896833587,
+      "belief": 0.09885497215916128,
+      "calibration": 0.0,
+      "clearance": 0.07787450506281451,
+      "corridor": 0.21069503738349224,
+      "distillation": 0.0,
+      "disturbance": 0.0007993320816102586,
+      "gate": 0.0,
+      "grasp_affordance": 0.0100274878874922,
+      "occluder_contact": 0.19033558541486242,
+      "persistence": 0.508021433908148,
+      "phase": 0.19023076729739413,
+      "planner_ranking": 0.058458461105322636,
+      "planner_risk": 0.03440776518976488,
+      "planner_success": 0.1257152666627359,
+      "proposal_diversity": 0.0,
+      "proposal_mode": 0.7171601638072679,
+      "proposal_ranking": 1.499033512187605,
+      "proposal_reconstruction": 1.066634831809196,
+      "proposal_success": 0.3018947724534684,
+      "reocclusion": 0.16926059677821248,
+      "role_swap_consistency": 0.0,
+      "support_mode": 0.4455214215426886,
+      "support_stability": 0.13059799138362668,
+      "task_metrics": 0.07159904390573502,
+      "total": 2.4211200485710336,
+      "transition": 3.487839874099283,
+      "uncertainty": 3.770016950513401e-06,
+      "visibility": 0.09318254963189614,
+      "world_model": 0.0
+    },
+    "val": {
+      "action": 1.1680383563041687,
+      "arm_role": 0.001657356577925384,
+      "belief": 0.09766801769534747,
+      "calibration": 0.0,
+      "clearance": 0.07670599135259787,
+      "corridor": 0.20785387406746547,
+      "distillation": 0.0,
+      "disturbance": 0.0007254338066559285,
+      "gate": 0.0,
+      "grasp_affordance": 0.009808245363334816,
+      "occluder_contact": 0.18903621584177016,
+      "persistence": 0.43403610289096833,
+      "phase": 0.17749264603480697,
+      "planner_ranking": 0.00962653555907309,
+      "planner_risk": 0.02840747827043136,
+      "planner_success": 0.0469651294251283,
+      "proposal_diversity": 0.0,
+      "proposal_mode": 0.5958098510901133,
+      "proposal_ranking": 1.567319353421529,
+      "proposal_reconstruction": 1.0027365585168202,
+      "proposal_success": 0.3119396299123764,
+      "reocclusion": 0.14939573630690575,
+      "role_swap_consistency": 0.0,
+      "support_mode": 0.38477273682753244,
+      "support_stability": 0.12813995343943438,
+      "task_metrics": 0.05784295691798131,
+      "total": 2.3466440041859946,
+      "transition": 3.402106682459513,
+      "uncertainty": 3.2218885041383296e-06,
+      "visibility": 0.09148541142543157,
+      "world_model": 0.0
+    }
+  },
+  {
+    "epoch": 2,
+    "train": {
+      "action": 1.187824563819821,
+      "arm_role": 0.0017524876263963075,
+      "belief": 0.09850409833573494,
+      "calibration": 0.0,
+      "clearance": 0.07750590865602013,
+      "corridor": 0.21022135673576042,
+      "distillation": 0.0,
+      "disturbance": 0.0008020720826393432,
+      "gate": 0.0,
+      "grasp_affordance": 0.009951516582841883,
+      "occluder_contact": 0.190022504630209,
+      "persistence": 0.5073582559448331,
+      "phase": 0.17974354623339506,
+      "planner_ranking": 0.009596662447169549,
+      "planner_risk": 0.03246875642603185,
+      "planner_success": 0.06673186843698266,
+      "proposal_diversity": 0.0,
+      "proposal_mode": 0.7036348676481167,
+      "proposal_ranking": 1.4990194234527459,
+      "proposal_reconstruction": 1.0593123075340976,
+      "proposal_success": 0.30170050113141034,
+      "reocclusion": 0.1706294410807245,
+      "role_swap_consistency": 0.0,
+      "support_mode": 0.4435207678490326,
+      "support_stability": 0.12954452590030782,
+      "task_metrics": 0.07019141574679803,
+      "total": 2.3952997061384824,
+      "transition": 3.4510987426052573,
+      "uncertainty": 2.649417712834203e-06,
+      "visibility": 0.09213429119657068,
+      "world_model": 0.0
+    },
+    "val": {
+      "action": 1.1680383563041687,
+      "arm_role": 0.0005777989087315897,
+      "belief": 0.09620878870288531,
+      "calibration": 0.0,
+      "clearance": 0.07562205567955971,
+      "corridor": 0.2099471464753151,
+      "distillation": 0.0,
+      "disturbance": 0.0008037402614718304,
+      "gate": 0.0,
+      "grasp_affordance": 0.009381201630458236,
+      "occluder_contact": 0.18789172718922298,
+      "persistence": 0.44771519377827645,
+      "phase": 0.15351878677805264,
+      "planner_ranking": 0.005908836016897112,
+      "planner_risk": 0.029111843556165695,
+      "planner_success": 0.030371779979517063,
+      "proposal_diversity": 0.0,
+      "proposal_mode": 0.6608088513215383,
+      "proposal_ranking": 1.519856317838033,
+      "proposal_reconstruction": 0.9984971513350804,
+      "proposal_success": 0.2899133563041687,
+      "reocclusion": 0.15338999405503273,
+      "role_swap_consistency": 0.0,
+      "support_mode": 0.4591325432062149,
+      "support_stability": 0.12738436510165532,
+      "task_metrics": 0.05577167191853126,
+      "total": 2.3411471287409467,
+      "transition": 3.3808055957158407,
+      "uncertainty": 1.560352771671584e-06,
+      "visibility": 0.08981477295358976,
+      "world_model": 0.0
+    }
+  },
+  {
+    "epoch": 3,
+    "train": {
+      "action": 1.1873075451169695,
+      "arm_role": 0.0010167556069400005,
+      "belief": 0.09699463875604276,
+      "calibration": 0.0,
+      "clearance": 0.0765939431280649,
+      "corridor": 0.21000426350271,
+      "distillation": 0.0,
+      "disturbance": 0.0008205439020564561,
+      "gate": 0.0,
+      "grasp_affordance": 0.009616962144886996,
+      "occluder_contact": 0.1890684860844572,
+      "persistence": 0.5268036977802756,
+      "phase": 0.18212753434141143,
+      "planner_ranking": 0.007861482998857102,
+      "planner_risk": 0.0305439497837249,
+      "planner_success": 0.0545816100632944,
+      "proposal_diversity": 0.0,
+      "proposal_mode": 0.7096028443144149,
+      "proposal_ranking": 1.49962230790563,
+      "proposal_reconstruction": 1.0570516235688154,
+      "proposal_success": 0.3012468101096754,
+      "reocclusion": 0.16893144916085637,
+      "role_swap_consistency": 0.0,
+      "support_mode": 0.43846767214166016,
+      "support_stability": 0.12901192851865492,
+      "task_metrics": 0.0706772211500827,
+      "total": 2.383075835324135,
+      "transition": 3.399705786664947,
+      "uncertainty": 1.833678168140796e-06,
+      "visibility": 0.09043271063255663,
+      "world_model": 0.0
+    },
+    "val": {
+      "action": 1.1680383563041687,
+      "arm_role": 0.0008160963848543664,
+      "belief": 0.09533951580524444,
+      "calibration": 0.0,
+      "clearance": 0.07521944617231686,
+      "corridor": 0.2074363355835279,
+      "distillation": 0.0,
+      "disturbance": 0.0007471947777958121,
+      "gate": 0.0,
+      "grasp_affordance": 0.009425108910848697,
+      "occluder_contact": 0.187281297147274,
+      "persistence": 0.42866156020512186,
+      "phase": 0.13389708844115375,
+      "planner_ranking": 0.007386005097456897,
+      "planner_risk": 0.03013829297075669,
+      "planner_success": 0.027494619445254404,
+      "proposal_diversity": 0.0,
+      "proposal_mode": 0.7145659645398458,
+      "proposal_ranking": 1.4651208639144897,
+      "proposal_reconstruction": 0.99560972849528,
+      "proposal_success": 0.29622272253036497,
+      "reocclusion": 0.15021706620852152,
+      "role_swap_consistency": 0.0,
+      "support_mode": 0.3665752013524373,
+      "support_stability": 0.12691180408000946,
+      "task_metrics": 0.056707360843817396,
+      "total": 2.3298022985458373,
+      "transition": 3.3876041332880655,
+      "uncertainty": 1.581879031557302e-06,
+      "visibility": 0.08887151132027309,
+      "world_model": 0.0
+    }
+  }
+]

artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_base_reuse128_seed17/summary.json ADDED Viewed

The diff for this file is too large to render. See raw diff

artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_rank_only_fast_seed17/config_resolved.yaml ADDED Viewed

	@@ -0,0 +1,170 @@

+experiment_name: proxy_adapter_wrapped_clip_rank_only_fast_seed17
+output_dir: /workspace/workspace/outputs/adapter_proxy
+device: cuda
+seed: 17
+init_checkpoint: /workspace/workspace/VLAarchtests2/VLAarchtests/artifacts/outputs/r3d_handoff_phase/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_seed17/checkpoint_best.pt
+init_strict: false
+data:
+  proxies:
+  - foliage_proxy
+  - bag_proxy
+  - cloth_proxy
+  resolution: 224
+  dataset_version: reveal_proxy_v6_rgbd_elastic_state_phase_fast
+  train_episodes_per_proxy: 12
+  val_episodes_per_proxy: 4
+  train_dataset_path: /workspace/workspace/data/reveal_proxy/proxy_train_clip224_v6_rgbd_stage3_phase_fast.pt
+  val_dataset_path: /workspace/workspace/data/reveal_proxy/proxy_val_clip224_v6_rgbd_stage3_phase_fast.pt
+  rebuild_dataset: false
+  chunk_horizon: 8
+  rollout_horizon: 5
+  history_steps: 6
+  planner_candidates: 8
+  seed: 17
+optim:
+  epochs: 1
+  batch_size: 4
+  num_workers: 8
+  lr: 5.0e-05
+  weight_decay: 0.0001
+trainer:
+  policy_type: adapter_wrapped
+  training_regime: proxy_rank_only
+  eval_mode: adapter_active
+  adapter_mode: adapter_active
+  adapter_use_transition_model: false
+  adapter_use_task_conditioning: true
+  use_bf16: true
+  grad_clip_norm: 1.0
+  freeze_backbone: true
+  gradient_checkpointing: false
+  plan_during_train: false
+  plan_during_eval: false
+  support_mode_conditioning: true
+  planner_mode: false
+  use_depth: true
+  use_world_model: false
+  use_role_tokens: true
+  compute_equivariance_probe: false
+  trainable_parameter_prefixes:
+  - adapter.proposal_prior
+  - adapter.planner
+policy:
+  backbone:
+    model_name: openai/clip-vit-base-patch32
+    hidden_dim: 512
+    max_text_tokens: 32
+    freeze_backbone: true
+    gradient_checkpointing: false
+    use_dummy_backbone: false
+  fusion:
+    hidden_dim: 512
+    num_cameras: 3
+    num_layers: 4
+    num_heads: 8
+    ff_dim: 2048
+    dropout: 0.1
+    proprio_dim: 32
+    proprio_tokens: 1
+  memory:
+    hidden_dim: 512
+    action_dim: 14
+    history_steps: 6
+    scene_history_steps: 3
+    belief_history_steps: 8
+    num_layers: 2
+    dropout: 0.1
+    memory_bank_size: 4
+    scene_bank_size: 2
+    belief_bank_size: 2
+    num_heads: 8
+    max_history_steps: 8
+  decoder:
+    hidden_dim: 512
+    num_heads: 8
+    num_layers: 4
+    ff_dim: 2048
+    dropout: 0.1
+    chunk_size: 8
+    action_dim: 14
+    arm_action_dim: 7
+    num_candidates: 8
+    num_phases: 5
+    num_arm_roles: 4
+    num_proposal_modes: 7
+    planner_top_k: 4
+    proposal_delta_scale: 0.2
+    proposal_slot_scale: 0.05
+  reveal_head:
+    hidden_dim: 512
+    num_support_modes: 3
+    num_approach_templates: 32
+    rollout_horizon: 5
+    belief_map_size: 32
+    field_size: 16
+    num_heads: 8
+    predict_belief_map: true
+    num_phases: 5
+    num_arm_roles: 4
+    num_interaction_tokens: 8
+    num_tasks: 4
+  world_model:
+    hidden_dim: 512
+    action_dim: 14
+    num_support_modes: 3
+    num_approach_templates: 32
+    rollout_horizon: 5
+    field_size: 16
+    num_heads: 8
+    num_phases: 5
+    num_arm_roles: 4
+    num_interaction_tokens: 8
+    belief_map_size: 32
+    predict_belief_map: true
+    scene_bank_size: 2
+    belief_bank_size: 2
+    rollout_mode: compact_rollout
+    num_tasks: 4
+    lightweight_field_size: 4
+  planner:
+    hidden_dim: 512
+    num_candidates: 8
+    action_dim: 14
+    num_support_modes: 3
+    utility_margin: 0.1
+    num_heads: 8
+    num_layers: 2
+    num_phases: 5
+    num_arm_roles: 4
+    top_k: 4
+    adapter_confidence_threshold: 0.55
+loss_weights:
+  action: 0.5
+  phase: 0.0
+  arm_role: 0.0
+  support_mode: 0.0
+  corridor: 0.0
+  persistence: 0.0
+  disturbance: 0.0
+  world_model: 0.0
+  transition: 0.0
+  belief: 0.0
+  visibility: 0.0
+  clearance: 0.0
+  support_stability: 0.0
+  reocclusion: 0.0
+  occluder_contact: 0.0
+  grasp_affordance: 0.0
+  planner_success: 0.0
+  planner_risk: 0.0
+  planner_ranking: 0.2
+  proposal_reconstruction: 0.0
+  proposal_success: 0.1
+  proposal_ranking: 0.2
+  proposal_mode: 0.1
+  proposal_diversity: 0.02
+  role_swap_consistency: 0.0
+  task_metrics: 0.0
+  gate: 0.0
+  distillation: 0.05
+  calibration: 0.0

artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_rank_only_fast_seed17/metrics.json ADDED Viewed

	@@ -0,0 +1,71 @@

+[
+  {
+    "epoch": 0,
+    "train": {
+      "action": 1.197754372721133,
+      "arm_role": 0.002544189276902572,
+      "belief": 0.10325424243574557,
+      "calibration": 0.0,
+      "clearance": 0.08140122955260069,
+      "corridor": 0.21582962238513256,
+      "distillation": 0.0017091589068750973,
+      "disturbance": 0.0018385711983959798,
+      "gate": 0.0,
+      "grasp_affordance": 0.012481509039745382,
+      "occluder_contact": 0.194344752508661,
+      "persistence": 0.7591703522130442,
+      "phase": 0.11467522253160892,
+      "planner_ranking": 1.1083470168321028,
+      "planner_risk": 0.03255904554996802,
+      "planner_success": 0.8582628343416296,
+      "proposal_diversity": 0.0,
+      "proposal_mode": 1.1811642983685369,
+      "proposal_ranking": 0.7893771244132001,
+      "proposal_reconstruction": 1.2029107290765513,
+      "proposal_success": 0.6142160711081132,
+      "reocclusion": 0.25430014456176886,
+      "role_swap_consistency": 0.0,
+      "support_mode": 0.004081056595010602,
+      "support_stability": 0.13368942070266474,
+      "task_metrics": 0.0832325461442056,
+      "total": 1.158045794652856,
+      "transition": 0.0,
+      "uncertainty": 2.6861929209980705e-05,
+      "visibility": 0.09703111033076825,
+      "world_model": 0.0
+    },
+    "val": {
+      "action": 1.140361487865448,
+      "arm_role": 0.0023584125883644447,
+      "belief": 0.10074711591005325,
+      "calibration": 0.0,
+      "clearance": 0.0765643808990717,
+      "corridor": 0.1961718276143074,
+      "distillation": 0.003883325931383297,
+      "disturbance": 0.0014785153052798705,
+      "gate": 0.0,
+      "grasp_affordance": 0.010992531199008226,
+      "occluder_contact": 0.1946533638983965,
+      "persistence": 0.5068328934721649,
+      "phase": 0.16515514547063503,
+      "planner_ranking": 1.0683312863111496,
+      "planner_risk": 0.03190935752354562,
+      "planner_success": 0.8590418174862862,
+      "proposal_diversity": 0.0,
+      "proposal_mode": 0.8785674348473549,
+      "proposal_ranking": 0.5385221149772406,
+      "proposal_reconstruction": 1.144310723990202,
+      "proposal_success": 0.5420645326375961,
+      "reocclusion": 0.21981605514883995,
+      "role_swap_consistency": 0.0,
+      "support_mode": 0.003642815601779148,
+      "support_stability": 0.1338925575837493,
+      "task_metrics": 0.06881333282217383,
+      "total": 1.0338090658187866,
+      "transition": 0.0,
+      "uncertainty": 2.9527319277633524e-05,
+      "visibility": 0.0945812463760376,
+      "world_model": 0.0
+    }
+  }
+]

artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_rank_only_fast_seed17/summary.json ADDED Viewed

The diff for this file is too large to render. See raw diff

artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_rank_only_rebuild128_seed17/config_resolved.yaml ADDED Viewed

	@@ -0,0 +1,170 @@

+experiment_name: proxy_adapter_wrapped_clip_rank_only_rebuild128_seed17
+output_dir: /workspace/workspace/outputs/adapter_proxy
+device: cuda
+seed: 17
+init_checkpoint: /workspace/workspace/VLAarchtests2/VLAarchtests/artifacts/outputs/r3d_handoff_phase/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_seed17/checkpoint_best.pt
+init_strict: false
+data:
+  proxies:
+  - foliage_proxy
+  - bag_proxy
+  - cloth_proxy
+  resolution: 224
+  dataset_version: reveal_proxy_v6_rgbd_elastic_state_phase
+  train_episodes_per_proxy: 128
+  val_episodes_per_proxy: 32
+  train_dataset_path: /workspace/workspace/data/reveal_proxy/proxy_train_clip224_v6_rgbd_stage3_phase_rebuild128_seed17.pt
+  val_dataset_path: /workspace/workspace/data/reveal_proxy/proxy_val_clip224_v6_rgbd_stage3_phase_rebuild128_seed17.pt
+  rebuild_dataset: true
+  chunk_horizon: 8
+  rollout_horizon: 5
+  history_steps: 6
+  planner_candidates: 8
+  seed: 17
+optim:
+  epochs: 4
+  batch_size: 8
+  num_workers: 32
+  lr: 5.0e-05
+  weight_decay: 0.0001
+trainer:
+  policy_type: adapter_wrapped
+  training_regime: proxy_rank_only
+  eval_mode: adapter_active
+  adapter_mode: adapter_active
+  adapter_use_transition_model: false
+  adapter_use_task_conditioning: true
+  use_bf16: true
+  grad_clip_norm: 1.0
+  freeze_backbone: true
+  gradient_checkpointing: false
+  plan_during_train: false
+  plan_during_eval: false
+  support_mode_conditioning: true
+  planner_mode: false
+  use_depth: true
+  use_world_model: false
+  use_role_tokens: true
+  compute_equivariance_probe: false
+  trainable_parameter_prefixes:
+  - adapter.proposal_prior
+  - adapter.planner
+policy:
+  backbone:
+    model_name: openai/clip-vit-base-patch32
+    hidden_dim: 512
+    max_text_tokens: 32
+    freeze_backbone: true
+    gradient_checkpointing: false
+    use_dummy_backbone: false
+  fusion:
+    hidden_dim: 512
+    num_cameras: 3
+    num_layers: 4
+    num_heads: 8
+    ff_dim: 2048
+    dropout: 0.1
+    proprio_dim: 32
+    proprio_tokens: 1
+  memory:
+    hidden_dim: 512
+    action_dim: 14
+    history_steps: 6
+    scene_history_steps: 3
+    belief_history_steps: 8
+    num_layers: 2
+    dropout: 0.1
+    memory_bank_size: 4
+    scene_bank_size: 2
+    belief_bank_size: 2
+    num_heads: 8
+    max_history_steps: 8
+  decoder:
+    hidden_dim: 512
+    num_heads: 8
+    num_layers: 4
+    ff_dim: 2048
+    dropout: 0.1
+    chunk_size: 8
+    action_dim: 14
+    arm_action_dim: 7
+    num_candidates: 8
+    num_phases: 5
+    num_arm_roles: 4
+    num_proposal_modes: 7
+    planner_top_k: 4
+    proposal_delta_scale: 0.2
+    proposal_slot_scale: 0.05
+  reveal_head:
+    hidden_dim: 512
+    num_support_modes: 3
+    num_approach_templates: 32
+    rollout_horizon: 5
+    belief_map_size: 32
+    field_size: 16
+    num_heads: 8
+    predict_belief_map: true
+    num_phases: 5
+    num_arm_roles: 4
+    num_interaction_tokens: 8
+    num_tasks: 4
+  world_model:
+    hidden_dim: 512
+    action_dim: 14
+    num_support_modes: 3
+    num_approach_templates: 32
+    rollout_horizon: 5
+    field_size: 16
+    num_heads: 8
+    num_phases: 5
+    num_arm_roles: 4
+    num_interaction_tokens: 8
+    belief_map_size: 32
+    predict_belief_map: true
+    scene_bank_size: 2
+    belief_bank_size: 2
+    rollout_mode: compact_rollout
+    num_tasks: 4
+    lightweight_field_size: 4
+  planner:
+    hidden_dim: 512
+    num_candidates: 8
+    action_dim: 14
+    num_support_modes: 3
+    utility_margin: 0.1
+    num_heads: 8
+    num_layers: 2
+    num_phases: 5
+    num_arm_roles: 4
+    top_k: 4
+    adapter_confidence_threshold: 0.55
+loss_weights:
+  action: 0.5
+  phase: 0.0
+  arm_role: 0.0
+  support_mode: 0.0
+  corridor: 0.0
+  persistence: 0.0
+  disturbance: 0.0
+  world_model: 0.0
+  transition: 0.0
+  belief: 0.0
+  visibility: 0.0
+  clearance: 0.0
+  support_stability: 0.0
+  reocclusion: 0.0
+  occluder_contact: 0.0
+  grasp_affordance: 0.0
+  planner_success: 0.0
+  planner_risk: 0.0
+  planner_ranking: 0.2
+  proposal_reconstruction: 0.0
+  proposal_success: 0.1
+  proposal_ranking: 0.2
+  proposal_mode: 0.1
+  proposal_diversity: 0.02
+  role_swap_consistency: 0.0
+  task_metrics: 0.0
+  gate: 0.0
+  distillation: 0.05
+  calibration: 0.0

artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_rank_only_rebuild128_seed17/metrics.json ADDED Viewed

	@@ -0,0 +1,278 @@

+[
+  {
+    "epoch": 0,
+    "train": {
+      "action": 1.1852011870937187,
+      "arm_role": 0.002373194619387138,
+      "belief": 0.10289109610960263,
+      "calibration": 0.0,
+      "clearance": 0.08050862655920141,
+      "corridor": 0.21972917464851333,
+      "distillation": 0.00017329733114407843,
+      "disturbance": 0.0017395270088327531,
+      "gate": 0.0,
+      "grasp_affordance": 0.011768270616552659,
+      "occluder_contact": 0.19525797589987265,
+      "persistence": 0.9892086396072092,
+      "phase": 0.18924372737147227,
+      "planner_ranking": 0.8172678849777254,
+      "planner_risk": 0.05744413993939632,
+      "planner_success": 0.7064468672796458,
+      "proposal_diversity": 0.0,
+      "proposal_mode": 0.9432200854565916,
+      "proposal_ranking": 1.336866896693446,
+      "proposal_reconstruction": 1.1112627968066882,
+      "proposal_success": 0.4027111357500573,
+      "reocclusion": 0.24888639283530853,
+      "role_swap_consistency": 0.0,
+      "support_mode": 0.00371579679723109,
+      "support_stability": 0.13197413343591852,
+      "task_metrics": 0.08024843531746824,
+      "total": 1.1580296083658683,
+      "transition": 0.0,
+      "uncertainty": 2.5850595745526205e-05,
+      "visibility": 0.09642420045467985,
+      "world_model": 0.0
+    },
+    "val": {
+      "action": 1.1680383563041687,
+      "arm_role": 0.0023813226105024415,
+      "belief": 0.10179599796732267,
+      "calibration": 0.0,
+      "clearance": 0.07945799206693967,
+      "corridor": 0.2141698474685351,
+      "distillation": 0.0,
+      "disturbance": 0.0019217911574135845,
+      "gate": 0.0,
+      "grasp_affordance": 0.011626164180537064,
+      "occluder_contact": 0.19411553194125494,
+      "persistence": 0.8884257813294728,
+      "phase": 0.1341669425445919,
+      "planner_ranking": 0.27815661728382113,
+      "planner_risk": 0.09556023739278316,
+      "planner_success": 0.4189198156197866,
+      "proposal_diversity": 0.0,
+      "proposal_mode": 0.8612981418768565,
+      "proposal_ranking": 1.3541318853696187,
+      "proposal_reconstruction": 1.0655952940384548,
+      "proposal_success": 0.34761282006899513,
+      "reocclusion": 0.23091794028878213,
+      "role_swap_consistency": 0.0,
+      "support_mode": 0.0032604910316877066,
+      "support_stability": 0.1301962616542975,
+      "task_metrics": 0.06759862539668877,
+      "total": 1.0313682556152344,
+      "transition": 0.0,
+      "uncertainty": 2.724018773581823e-05,
+      "visibility": 0.09551568776369095,
+      "world_model": 0.0
+    }
+  },
+  {
+    "epoch": 1,
+    "train": {
+      "action": 1.1870849994050354,
+      "arm_role": 0.0023704882018549854,
+      "belief": 0.10286598608774297,
+      "calibration": 0.0,
+      "clearance": 0.08047503743226789,
+      "corridor": 0.21940489163418778,
+      "distillation": 0.0,
+      "disturbance": 0.0017350247245234978,
+      "gate": 0.0,
+      "grasp_affordance": 0.011760568257984744,
+      "occluder_contact": 0.19528898884769247,
+      "persistence": 0.9879098851625033,
+      "phase": 0.18875574952914936,
+      "planner_ranking": 0.08558745583628907,
+      "planner_risk": 0.1399850454651007,
+      "planner_success": 0.3386907313300782,
+      "proposal_diversity": 0.0,
+      "proposal_mode": 0.8769457270117367,
+      "proposal_ranking": 1.3593093036603527,
+      "proposal_reconstruction": 1.1160700326206303,
+      "proposal_success": 0.36580811954346026,
+      "reocclusion": 0.2486385852098465,
+      "role_swap_consistency": 0.0,
+      "support_mode": 0.003717367388621098,
+      "support_stability": 0.13195458464637524,
+      "task_metrics": 0.08063916548961352,
+      "total": 1.0067975045252247,
+      "transition": 0.0,
+      "uncertainty": 2.580285645843319e-05,
+      "visibility": 0.09639987023938604,
+      "world_model": 0.0
+    },
+    "val": {
+      "action": 1.1680383563041687,
+      "arm_role": 0.0023813226105024415,
+      "belief": 0.10179599796732267,
+      "calibration": 0.0,
+      "clearance": 0.07945799206693967,
+      "corridor": 0.2141698474685351,
+      "distillation": 0.0,
+      "disturbance": 0.0019217911574135845,
+      "gate": 0.0,
+      "grasp_affordance": 0.011626164180537064,
+      "occluder_contact": 0.19411553194125494,
+      "persistence": 0.8884257813294728,
+      "phase": 0.1341669425445919,
+      "planner_ranking": 0.020432091876864435,
+      "planner_risk": 0.16417022446791332,
+      "planner_success": 0.20522922178109487,
+      "proposal_diversity": 0.0,
+      "proposal_mode": 0.8313319782416025,
+      "proposal_ranking": 1.355160697301229,
+      "proposal_reconstruction": 1.065085584918658,
+      "proposal_success": 0.37029117544492085,
+      "reocclusion": 0.23091794028878213,
+      "role_swap_consistency": 0.0,
+      "support_mode": 0.0032604910316877066,
+      "support_stability": 0.1301962616542975,
+      "task_metrics": 0.06759862539668877,
+      "total": 0.979300343990326,
+      "transition": 0.0,
+      "uncertainty": 2.724018773581823e-05,
+      "visibility": 0.09551568776369095,
+      "world_model": 0.0
+    }
+  },
+  {
+    "epoch": 2,
+    "train": {
+      "action": 1.1864268629490828,
+      "arm_role": 0.002376417737031559,
+      "belief": 0.10281138397565409,
+      "calibration": 0.0,
+      "clearance": 0.08041088464630752,
+      "corridor": 0.21921880461839066,
+      "distillation": 0.0,
+      "disturbance": 0.0017383864957510548,
+      "gate": 0.0,
+      "grasp_affordance": 0.011750116954095849,
+      "occluder_contact": 0.19525049964920813,
+      "persistence": 0.9866341657686133,
+      "phase": 0.18828046964598866,
+      "planner_ranking": 0.01506587937317726,
+      "planner_risk": 0.17819794167240127,
+      "planner_success": 0.27137053726601,
+      "proposal_diversity": 0.0,
+      "proposal_mode": 0.871496272187273,
+      "proposal_ranking": 1.3522406766394608,
+      "proposal_reconstruction": 1.114444579396929,
+      "proposal_success": 0.36960093138598593,
+      "reocclusion": 0.24837740529485108,
+      "role_swap_consistency": 0.0,
+      "support_mode": 0.003723324929997951,
+      "support_stability": 0.1318679920890752,
+      "task_metrics": 0.08124440529641985,
+      "total": 0.9907847267239034,
+      "transition": 0.0,
+      "uncertainty": 2.5764442244401245e-05,
+      "visibility": 0.09634732048050697,
+      "world_model": 0.0
+    },
+    "val": {
+      "action": 1.1680383563041687,
+      "arm_role": 0.0023813226105024415,
+      "belief": 0.10179599796732267,
+      "calibration": 0.0,
+      "clearance": 0.07945799206693967,
+      "corridor": 0.2141698474685351,
+      "distillation": 0.0,
+      "disturbance": 0.0019217911574135845,
+      "gate": 0.0,
+      "grasp_affordance": 0.011626164180537064,
+      "occluder_contact": 0.19411553194125494,
+      "persistence": 0.8884257813294728,
+      "phase": 0.1341669425445919,
+      "planner_ranking": 0.008497202799965938,
+      "planner_risk": 0.1943199912707011,
+      "planner_success": 0.16650028626124064,
+      "proposal_diversity": 0.0,
+      "proposal_mode": 0.784325490395228,
+      "proposal_ranking": 1.3774529258410135,
+      "proposal_reconstruction": 1.0638848970333734,
+      "proposal_success": 0.3639564683039983,
+      "reocclusion": 0.23091794028878213,
+      "role_swap_consistency": 0.0,
+      "support_mode": 0.0032604910316877066,
+      "support_stability": 0.1301962616542975,
+      "task_metrics": 0.06759862539668877,
+      "total": 0.9760376830895742,
+      "transition": 0.0,
+      "uncertainty": 2.724018773581823e-05,
+      "visibility": 0.09551568776369095,
+      "world_model": 0.0
+    }
+  },
+  {
+    "epoch": 3,
+    "train": {
+      "action": 1.1860493772170122,
+      "arm_role": 0.002373194953958903,
+      "belief": 0.10285232979960802,
+      "calibration": 0.0,
+      "clearance": 0.08046898640253965,
+      "corridor": 0.21937422429313178,
+      "distillation": 0.0,
+      "disturbance": 0.001741885332568713,
+      "gate": 0.0,
+      "grasp_affordance": 0.011761440472880831,
+      "occluder_contact": 0.19526721023711838,
+      "persistence": 0.9867200422562471,
+      "phase": 0.18844207436895044,
+      "planner_ranking": 0.008475025738159022,
+      "planner_risk": 0.20258555417301274,
+      "planner_success": 0.24018349805298975,
+      "proposal_diversity": 0.0,
+      "proposal_mode": 0.8707189029004393,
+      "proposal_ranking": 1.3544838268215917,
+      "proposal_reconstruction": 1.113739546857962,
+      "proposal_success": 0.36756599099696186,
+      "reocclusion": 0.24844886725690185,
+      "role_swap_consistency": 0.0,
+      "support_mode": 0.0037099133850224003,
+      "support_stability": 0.13174430547016008,
+      "task_metrics": 0.0815500450328368,
+      "total": 0.9894452145119675,
+      "transition": 0.0,
+      "uncertainty": 2.5792860569021012e-05,
+      "visibility": 0.09639218781425171,
+      "world_model": 0.0
+    },
+    "val": {
+      "action": 1.1680383563041687,
+      "arm_role": 0.0023813226105024415,
+      "belief": 0.10179599796732267,
+      "calibration": 0.0,
+      "clearance": 0.07945799206693967,
+      "corridor": 0.2141698474685351,
+      "distillation": 0.0,
+      "disturbance": 0.0019217911574135845,
+      "gate": 0.0,
+      "grasp_affordance": 0.011626164180537064,
+      "occluder_contact": 0.19411553194125494,
+      "persistence": 0.8884257813294728,
+      "phase": 0.1341669425445919,
+      "planner_ranking": 0.006291244722281893,
+      "planner_risk": 0.22365033129851022,
+      "planner_success": 0.1353773462275664,
+      "proposal_diversity": 0.0,
+      "proposal_mode": 0.8833410640557607,
+      "proposal_ranking": 1.3212236324946085,
+      "proposal_reconstruction": 1.0634535759687425,
+      "proposal_success": 0.36492464542388914,
+      "reocclusion": 0.23091794028878213,
+      "role_swap_consistency": 0.0,
+      "support_mode": 0.0032604910316877066,
+      "support_stability": 0.1301962616542975,
+      "task_metrics": 0.06759862539668877,
+      "total": 0.9743489940961202,
+      "transition": 0.0,
+      "uncertainty": 2.724018773581823e-05,
+      "visibility": 0.09551568776369095,
+      "world_model": 0.0
+    }
+  }
+]

artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_rank_only_rebuild128_seed17/summary.json ADDED Viewed

The diff for this file is too large to render. See raw diff

artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_transition_fast_seed17/config_resolved.yaml ADDED Viewed

	@@ -0,0 +1,174 @@

+experiment_name: proxy_adapter_wrapped_clip_transition_fast_seed17
+output_dir: /workspace/workspace/outputs/adapter_proxy
+device: cuda
+seed: 17
+init_checkpoint: /workspace/workspace/VLAarchtests2/VLAarchtests/artifacts/outputs/r3d_handoff_phase/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_seed17/checkpoint_best.pt
+init_strict: false
+data:
+  proxies:
+  - foliage_proxy
+  - bag_proxy
+  - cloth_proxy
+  resolution: 224
+  dataset_version: reveal_proxy_v6_rgbd_elastic_state_phase_fast_transition
+  train_episodes_per_proxy: 12
+  val_episodes_per_proxy: 4
+  train_dataset_path: /workspace/workspace/data/reveal_proxy/proxy_train_clip224_v6_rgbd_stage3_phase_fast_transition.pt
+  val_dataset_path: /workspace/workspace/data/reveal_proxy/proxy_val_clip224_v6_rgbd_stage3_phase_fast_transition.pt
+  rebuild_dataset: false
+  chunk_horizon: 8
+  rollout_horizon: 5
+  history_steps: 6
+  planner_candidates: 8
+  seed: 17
+optim:
+  epochs: 2
+  batch_size: 4
+  num_workers: 8
+  lr: 0.0001
+  weight_decay: 0.0001
+trainer:
+  policy_type: adapter_wrapped
+  training_regime: adapter_train_frozen_trunk
+  eval_mode: adapter_active
+  adapter_mode: adapter_active
+  adapter_use_transition_model: true
+  adapter_use_task_conditioning: true
+  use_bf16: true
+  grad_clip_norm: 1.0
+  freeze_backbone: true
+  gradient_checkpointing: false
+  plan_during_train: false
+  plan_during_eval: false
+  support_mode_conditioning: true
+  planner_mode: false
+  use_depth: true
+  use_world_model: false
+  use_role_tokens: true
+  compute_equivariance_probe: false
+  trainable_parameter_prefixes:
+  - adapter.state_head
+  - adapter.proposal_prior
+  - adapter.transition_model
+  - adapter.planner
+policy:
+  backbone:
+    model_name: openai/clip-vit-base-patch32
+    hidden_dim: 512
+    max_text_tokens: 32
+    freeze_backbone: true
+    gradient_checkpointing: false
+    use_dummy_backbone: false
+  fusion:
+    hidden_dim: 512
+    num_cameras: 3
+    num_layers: 4
+    num_heads: 8
+    ff_dim: 2048
+    dropout: 0.1
+    proprio_dim: 32
+    proprio_tokens: 1
+  memory:
+    hidden_dim: 512
+    action_dim: 14
+    history_steps: 6
+    scene_history_steps: 3
+    belief_history_steps: 8
+    num_layers: 2
+    dropout: 0.1
+    memory_bank_size: 4
+    scene_bank_size: 2
+    belief_bank_size: 2
+    num_heads: 8
+    max_history_steps: 8
+    reveal_cache_steps: 4
+    reveal_cache_decay: 0.7
+  decoder:
+    hidden_dim: 512
+    num_heads: 8
+    num_layers: 4
+    ff_dim: 2048
+    dropout: 0.1
+    chunk_size: 8
+    action_dim: 14
+    arm_action_dim: 7
+    num_candidates: 8
+    num_phases: 5
+    num_arm_roles: 4
+    num_proposal_modes: 7
+    planner_top_k: 4
+    proposal_delta_scale: 0.2
+    proposal_slot_scale: 0.05
+  reveal_head:
+    hidden_dim: 512
+    num_support_modes: 3
+    num_approach_templates: 32
+    rollout_horizon: 5
+    belief_map_size: 32
+    field_size: 16
+    num_heads: 8
+    predict_belief_map: true
+    num_phases: 5
+    num_arm_roles: 4
+    num_interaction_tokens: 8
+    num_tasks: 4
+  world_model:
+    hidden_dim: 512
+    action_dim: 14
+    num_support_modes: 3
+    num_approach_templates: 32
+    rollout_horizon: 5
+    field_size: 16
+    num_heads: 8
+    num_phases: 5
+    num_arm_roles: 4
+    num_interaction_tokens: 8
+    belief_map_size: 32
+    predict_belief_map: true
+    scene_bank_size: 2
+    belief_bank_size: 2
+    rollout_mode: compact_rollout
+    num_tasks: 4
+    lightweight_field_size: 4
+  planner:
+    hidden_dim: 512
+    num_candidates: 8
+    action_dim: 14
+    num_support_modes: 3
+    utility_margin: 0.1
+    num_heads: 8
+    num_layers: 2
+    num_phases: 5
+    num_arm_roles: 4
+    top_k: 4
+    adapter_confidence_threshold: 0.45
+loss_weights:
+  action: 1.0
+  phase: 0.08
+  arm_role: 0.08
+  support_mode: 0.08
+  corridor: 0.12
+  persistence: 0.06
+  disturbance: 0.06
+  world_model: 0.0
+  transition: 0.15
+  belief: 0.05
+  visibility: 0.05
+  clearance: 0.06
+  support_stability: 0.06
+  reocclusion: 0.06
+  occluder_contact: 0.05
+  grasp_affordance: 0.05
+  planner_success: 0.15
+  planner_risk: 0.08
+  planner_ranking: 0.15
+  proposal_reconstruction: 0.08
+  proposal_success: 0.1
+  proposal_ranking: 0.12
+  proposal_mode: 0.08
+  proposal_diversity: 0.05
+  role_swap_consistency: 0.0
+  task_metrics: 0.06
+  gate: 0.05
+  distillation: 0.05
+  calibration: 0.02

artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_transition_fast_seed17/metrics.json ADDED Viewed

	@@ -0,0 +1,140 @@

+[
+  {
+    "epoch": 0,
+    "train": {
+      "action": 1.2014537013095359,
+      "arm_role": 0.004142667506011608,
+      "belief": 0.10642868701530539,
+      "calibration": 0.0,
+      "clearance": 0.08262280795885169,
+      "corridor": 0.22370571363717318,
+      "distillation": 0.0018765011868115676,
+      "disturbance": 0.0011591566448180895,
+      "gate": 0.0,
+      "grasp_affordance": 0.012573797620185043,
+      "occluder_contact": 0.1948690563440323,
+      "persistence": 0.5442525049894238,
+      "phase": 0.14094198657118756,
+      "planner_ranking": 1.1814680177232493,
+      "planner_risk": 0.03286057249035524,
+      "planner_success": 0.49930323725161346,
+      "proposal_diversity": 0.0,
+      "proposal_mode": 0.9191397609918014,
+      "proposal_ranking": 0.7756888011227483,
+      "proposal_reconstruction": 1.1855679594952127,
+      "proposal_success": 0.5070859141971754,
+      "reocclusion": 0.2707118239739667,
+      "role_swap_consistency": 0.0,
+      "support_mode": 0.81035483142604,
+      "support_stability": 0.135533408464297,
+      "task_metrics": 0.07755828170996645,
+      "total": 2.5070675456005596,
+      "transition": 3.653185836646868,
+      "uncertainty": 5.752725617284064e-05,
+      "visibility": 0.0989064211430757,
+      "world_model": 0.0
+    },
+    "val": {
+      "action": 1.1732755303382874,
+      "arm_role": 0.001568492029036861,
+      "belief": 0.09933605697005987,
+      "calibration": 0.0,
+      "clearance": 0.07699812250211835,
+      "corridor": 0.1967080980539322,
+      "distillation": 0.002813707455061376,
+      "disturbance": 0.0013425838133116486,
+      "gate": 0.0,
+      "grasp_affordance": 0.010458780219778419,
+      "occluder_contact": 0.19887321814894676,
+      "persistence": 0.3571807991247624,
+      "phase": 0.23128701612586156,
+      "planner_ranking": 0.9876129180192947,
+      "planner_risk": 0.032078082440420985,
+      "planner_success": 0.3786630928516388,
+      "proposal_diversity": 0.0,
+      "proposal_mode": 0.5780632123351097,
+      "proposal_ranking": 0.6625044224783778,
+      "proposal_reconstruction": 1.1224287077784538,
+      "proposal_success": 0.32306262850761414,
+      "reocclusion": 0.21124972961843014,
+      "role_swap_consistency": 0.0,
+      "support_mode": 0.6393884606659412,
+      "support_stability": 0.13537815306335688,
+      "task_metrics": 0.06436877744272351,
+      "total": 2.0122548937797546,
+      "transition": 1.4614488631486893,
+      "uncertainty": 4.029161317120611e-05,
+      "visibility": 0.09316261485219002,
+      "world_model": 0.0
+    }
+  },
+  {
+    "epoch": 1,
+    "train": {
+      "action": 1.2011131566503774,
+      "arm_role": 0.0004577429398246433,
+      "belief": 0.10254939839891765,
+      "calibration": 0.0,
+      "clearance": 0.08239235486025395,
+      "corridor": 0.209725521646602,
+      "distillation": 0.0029014512876291637,
+      "disturbance": 0.001299830724272634,
+      "gate": 0.0,
+      "grasp_affordance": 0.011238907848525307,
+      "occluder_contact": 0.19421758470327957,
+      "persistence": 0.2043300135941852,
+      "phase": 0.16561541823751252,
+      "planner_ranking": 0.9580214386400969,
+      "planner_risk": 0.03229632252908271,
+      "planner_success": 0.36985718167346454,
+      "proposal_diversity": 0.0,
+      "proposal_mode": 0.5392822428889896,
+      "proposal_ranking": 0.7421491457068402,
+      "proposal_reconstruction": 1.15594565997953,
+      "proposal_success": 0.27282858737137006,
+      "reocclusion": 0.13705282172431116,
+      "role_swap_consistency": 0.0,
+      "support_mode": 0.6220477378886679,
+      "support_stability": 0.13319886832133584,
+      "task_metrics": 0.07506552370993988,
+      "total": 2.026416410570559,
+      "transition": 1.500129469062971,
+      "uncertainty": 1.367867451214668e-05,
+      "visibility": 0.09593491644962975,
+      "world_model": 0.0
+    },
+    "val": {
+      "action": 1.2080544531345367,
+      "arm_role": 0.0001097214071705821,
+      "belief": 0.09881071373820305,
+      "calibration": 0.0,
+      "clearance": 0.07554284203797579,
+      "corridor": 0.19048454985022545,
+      "distillation": 0.0,
+      "disturbance": 0.00071150396252051,
+      "gate": 0.0,
+      "grasp_affordance": 0.009015273419208825,
+      "occluder_contact": 0.1911622602492571,
+      "persistence": 0.4154473473317921,
+      "phase": 0.22401500784326345,
+      "planner_ranking": 0.9130920022726059,
+      "planner_risk": 0.03172952332533896,
+      "planner_success": 0.36061106994748116,
+      "proposal_diversity": 0.0,
+      "proposal_mode": 0.46144857816398144,
+      "proposal_ranking": 0.6975354589521885,
+      "proposal_reconstruction": 1.0902796238660812,
+      "proposal_success": 0.2553649302572012,
+      "reocclusion": 0.18199651315808296,
+      "role_swap_consistency": 0.0,
+      "support_mode": 0.7191376462578773,
+      "support_stability": 0.13278500083833933,
+      "task_metrics": 0.06281590019352734,
+      "total": 1.9747500270605087,
+      "transition": 1.1311135664582253,
+      "uncertainty": 7.986968377338144e-06,
+      "visibility": 0.09258495084941387,
+      "world_model": 0.0
+    }
+  }
+]

artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_transition_fast_seed17/summary.json ADDED Viewed

The diff for this file is too large to render. See raw diff

artifacts/reports/anchor_dual_push_smoke_ep1/original_trunk/rollout_eval.json ADDED Viewed

	@@ -0,0 +1,280 @@

+{
+  "checkpoint": "/workspace/workspace/VLAarchtests2/outputs/rlbench_dual_push/rlbench_dual_push_backbone_only_clip_chunk8_weighted_seed17/checkpoint_best.pt",
+  "plan_requested": false,
+  "plan_applied": false,
+  "planner_mode": "trainable",
+  "support_mode_conditioning": true,
+  "task_conditioning": true,
+  "geometry_enabled": true,
+  "world_model_mode": "checkpoint_default",
+  "episodes_per_task": 1,
+  "episode_length": 25,
+  "resolution": 256,
+  "reset_retries": 20,
+  "arm_mode": "planning",
+  "delta_scale": 1.0,
+  "cameras": [
+    "front",
+    "wrist_left",
+    "wrist_right"
+  ],
+  "tasks": {
+    "bimanual_dual_push_buttons": {
+      "task_class": "BimanualDualPushButtons",
+      "successes": [
+        0.0
+      ],
+      "returns": [
+        0.0
+      ],
+      "path_recoveries": [
+        0
+      ],
+      "noop_fallbacks": [
+        0
+      ],
+      "reset_retries": [
+        0
+      ],
+      "episode_traces": [
+        {
+          "language_goal": "push the olive and the orange buttons",
+          "steps": [
+            {
+              "timestep": 0,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 1,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 2,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 3,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 4,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 5,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 6,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 7,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 8,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 9,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 10,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 11,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 12,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 13,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 14,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 15,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 16,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 17,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 18,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 19,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 20,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 21,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 22,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 23,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 24,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            }
+          ],
+          "success": 0.0,
+          "return": 0.0,
+          "path_recoveries": 0,
+          "noop_fallbacks": 0
+        }
+      ],
+      "mean_success": 0.0,
+      "mean_return": 0.0
+    }
+  },
+  "mean_success": 0.0
+}

artifacts/reports/anchor_dual_push_smoke_ep1/original_trunk/rollout_eval.md ADDED Viewed

	@@ -0,0 +1,14 @@

+# RLBench Rollout Eval
+- Checkpoint: `/workspace/workspace/VLAarchtests2/outputs/rlbench_dual_push/rlbench_dual_push_backbone_only_clip_chunk8_weighted_seed17/checkpoint_best.pt`
+- Plan requested: `False`
+- Plan applied: `False`
+- Support-mode conditioning: `True`
+- Task conditioning: `True`
+- Geometry enabled: `True`
+- World-model mode: `checkpoint_default`
+- Mean success: `0.000`
+## Per-task
+- `bimanual_dual_push_buttons`: mean_success=0.000, returns=[0.0]

artifacts/reports/anchor_dual_push_smoke_ep1/original_trunk/rollout_eval.partial.json ADDED Viewed

	@@ -0,0 +1,280 @@

+{
+  "checkpoint": "/workspace/workspace/VLAarchtests2/outputs/rlbench_dual_push/rlbench_dual_push_backbone_only_clip_chunk8_weighted_seed17/checkpoint_best.pt",
+  "plan_requested": false,
+  "plan_applied": false,
+  "planner_mode": "trainable",
+  "support_mode_conditioning": true,
+  "task_conditioning": true,
+  "geometry_enabled": true,
+  "world_model_mode": "checkpoint_default",
+  "episodes_per_task": 1,
+  "episode_length": 25,
+  "resolution": 256,
+  "reset_retries": 20,
+  "arm_mode": "planning",
+  "delta_scale": 1.0,
+  "cameras": [
+    "front",
+    "wrist_left",
+    "wrist_right"
+  ],
+  "tasks": {
+    "bimanual_dual_push_buttons": {
+      "task_class": "BimanualDualPushButtons",
+      "successes": [
+        0.0
+      ],
+      "returns": [
+        0.0
+      ],
+      "path_recoveries": [
+        0
+      ],
+      "noop_fallbacks": [
+        0
+      ],
+      "reset_retries": [
+        0
+      ],
+      "episode_traces": [
+        {
+          "language_goal": "push the olive and the orange buttons",
+          "steps": [
+            {
+              "timestep": 0,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 1,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 2,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 3,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 4,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 5,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 6,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 7,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 8,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 9,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 10,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 11,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 12,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 13,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 14,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 15,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 16,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 17,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 18,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 19,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 20,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 21,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 22,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 23,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 24,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            }
+          ],
+          "success": 0.0,
+          "return": 0.0,
+          "path_recoveries": 0,
+          "noop_fallbacks": 0
+        }
+      ],
+      "mean_success": 0.0,
+      "mean_return": 0.0
+    }
+  },
+  "mean_success": 0.0
+}

artifacts/reports/peract2_anchor_smoke_live/bimanual_push_box/command.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+ /workspace/envs/rlbench/bin/python -m sim_rlbench.launch_smoke --task bimanual_push_box --resolution 224 --headless

artifacts/reports/peract2_anchor_smoke_live/bimanual_push_box/stderr.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+ WARNING:root:not sure how _robot_shapes are used is used.

artifacts/reports/peract2_anchor_smoke_live/bimanual_push_box/stdout.txt ADDED Viewed

	@@ -0,0 +1,38 @@

+{
+  "display": ":99",
+  "headless": true,
+  "task": "BimanualPushBox",
+  "description": "push the box to the red area",
+  "front_rgb_shape": [
+    224,
+    224,
+    3
+  ],
+  "wrist_left_rgb_shape": [
+    224,
+    224,
+    3
+  ],
+  "wrist_right_rgb_shape": [
+    224,
+    224,
+    3
+  ],
+  "right_pose_shape": [
+    7
+  ],
+  "left_pose_shape": [
+    7
+  ],
+  "stepped_mode": "bimanual_noop",
+  "action_finite": true,
+  "action_dim": 18,
+  "reward": 0.0,
+  "done": false,
+  "front_rgb_shape_after_step": [
+    224,
+    224,
+    3
+  ]
+}
+[CoppeliaSim:loadinfo]   done.

artifacts/reports/proxy_base_reuse128_smoke/scripted/reveal_benchmark.md ADDED Viewed

	@@ -0,0 +1,17 @@

+# Reveal Proxy Benchmark
+## scripted
+- controller: scripted
+- checkpoint: none
+- episodes: 72.000
+- mean_success: 1.000
+- visibility_integral: 1.691
+- corridor_availability: 0.706
+- reocclusion_rate: 0.000
+- disturbance_cost: 0.123
+- premature_retrieve_rate: 0.000
+- reocclusion_after_reveal_rate: 0.000
+- planner_regret: 0.000
+- foliage_success: 1.000
+- bag_success: 1.000
+- cloth_success: 1.000

artifacts/reports/proxy_semantic_heuristic_quick12/active/reveal_benchmark.json ADDED Viewed

The diff for this file is too large to render. See raw diff

artifacts/reports/proxy_semantic_heuristic_quick12/active/reveal_benchmark.md ADDED Viewed

	@@ -0,0 +1,17 @@

+# Reveal Proxy Benchmark
+## adapter
+- controller: model
+- checkpoint: /workspace/workspace/outputs/adapter_proxy/proxy_adapter_wrapped_clip_base_reuse128_seed17/checkpoint_best.pt
+- episodes: 12.000
+- mean_success: 0.000
+- visibility_integral: 21.449
+- corridor_availability: 0.331
+- reocclusion_rate: 0.001
+- disturbance_cost: 0.397
+- premature_retrieve_rate: 0.000
+- reocclusion_after_reveal_rate: 0.000
+- planner_regret: 0.233
+- foliage_success: 0.000
+- bag_success: 0.000
+- cloth_success: 0.000

artifacts/reports/proxy_semantic_heuristic_quick12/candidate0/reveal_benchmark.md ADDED Viewed

	@@ -0,0 +1,17 @@

+# Reveal Proxy Benchmark
+## adapter
+- controller: candidate0
+- checkpoint: /workspace/workspace/outputs/adapter_proxy/proxy_adapter_wrapped_clip_base_reuse128_seed17/checkpoint_best.pt
+- episodes: 12.000
+- mean_success: 0.000
+- visibility_integral: 2.261
+- corridor_availability: 0.027
+- reocclusion_rate: 0.017
+- disturbance_cost: 0.747
+- premature_retrieve_rate: 0.367
+- reocclusion_after_reveal_rate: 0.167
+- planner_regret: 0.019
+- foliage_success: 0.000
+- bag_success: 0.000
+- cloth_success: 0.000

artifacts/reports/proxy_semantic_heuristic_quick12/noop/reveal_benchmark.json ADDED Viewed

The diff for this file is too large to render. See raw diff

artifacts/reports/proxy_semantic_heuristic_quick12/noop/reveal_benchmark.md ADDED Viewed

	@@ -0,0 +1,17 @@

+# Reveal Proxy Benchmark
+## adapter
+- controller: model
+- checkpoint: /workspace/workspace/outputs/adapter_proxy/proxy_adapter_wrapped_clip_base_reuse128_seed17/checkpoint_best.pt
+- episodes: 12.000
+- mean_success: 0.000
+- visibility_integral: 2.261
+- corridor_availability: 0.027
+- reocclusion_rate: 0.017
+- disturbance_cost: 0.747
+- premature_retrieve_rate: 0.367
+- reocclusion_after_reveal_rate: 0.167
+- planner_regret: 0.019
+- foliage_success: 0.000
+- bag_success: 0.000
+- cloth_success: 0.000

artifacts/reports/proxy_semantic_heuristic_quick12/oracle/reveal_benchmark.json ADDED Viewed

The diff for this file is too large to render. See raw diff

artifacts/reports/proxy_semantic_heuristic_quick12/oracle/reveal_benchmark.md ADDED Viewed

	@@ -0,0 +1,17 @@

+# Reveal Proxy Benchmark
+## adapter
+- controller: oracle
+- checkpoint: /workspace/workspace/outputs/adapter_proxy/proxy_adapter_wrapped_clip_base_reuse128_seed17/checkpoint_best.pt
+- episodes: 12.000
+- mean_success: 0.000
+- visibility_integral: 3.338
+- corridor_availability: 0.062
+- reocclusion_rate: 0.018
+- disturbance_cost: 0.707
+- premature_retrieve_rate: 0.575
+- reocclusion_after_reveal_rate: 0.083
+- planner_regret: 0.000
+- foliage_success: 0.000
+- bag_success: 0.000
+- cloth_success: 0.000

artifacts/reports/proxy_semantic_nowm_quick12_final_noop/reveal_benchmark.md ADDED Viewed

	@@ -0,0 +1,17 @@

+# Reveal Proxy Benchmark
+## noop
+- controller: model
+- checkpoint: /workspace/workspace/outputs/adapter_proxy/proxy_adapter_wrapped_clip_base_reuse128_seed17/checkpoint_best.pt
+- episodes: 36.000
+- mean_success: 0.000
+- visibility_integral: 2.275
+- corridor_availability: 0.031
+- reocclusion_rate: 0.021
+- disturbance_cost: 0.743
+- premature_retrieve_rate: 0.362
+- reocclusion_after_reveal_rate: 0.278
+- planner_regret: 0.021
+- foliage_success: 0.000
+- bag_success: 0.000
+- cloth_success: 0.000

artifacts/reports/repaired_dual_push_chunk8_ep1_len25/rollout_eval.md ADDED Viewed

	@@ -0,0 +1,14 @@

+# RLBench Rollout Eval
+- Checkpoint: `/workspace/workspace/VLAarchtests2/outputs/rlbench_dual_push/rlbench_dual_push_backbone_only_clip_chunk8_weighted_seed17/checkpoint_best.pt`
+- Plan requested: `False`
+- Plan applied: `False`
+- Support-mode conditioning: `True`
+- Task conditioning: `True`
+- Geometry enabled: `True`
+- World-model mode: `checkpoint_default`
+- Mean success: `0.000`
+## Per-task
+- `bimanual_dual_push_buttons`: mean_success=0.000, returns=[0.0]

artifacts/reports/repaired_dual_push_chunk8_ep1_len25/rollout_eval.partial.json ADDED Viewed

	@@ -0,0 +1,280 @@

+{
+  "checkpoint": "/workspace/workspace/VLAarchtests2/outputs/rlbench_dual_push/rlbench_dual_push_backbone_only_clip_chunk8_weighted_seed17/checkpoint_best.pt",
+  "plan_requested": false,
+  "plan_applied": false,
+  "planner_mode": "trainable",
+  "support_mode_conditioning": true,
+  "task_conditioning": true,
+  "geometry_enabled": true,
+  "world_model_mode": "checkpoint_default",
+  "episodes_per_task": 1,
+  "episode_length": 25,
+  "resolution": 256,
+  "reset_retries": 20,
+  "arm_mode": "planning",
+  "delta_scale": 1.0,
+  "cameras": [
+    "front",
+    "wrist_left",
+    "wrist_right"
+  ],
+  "tasks": {
+    "bimanual_dual_push_buttons": {
+      "task_class": "BimanualDualPushButtons",
+      "successes": [
+        0.0
+      ],
+      "returns": [
+        0.0
+      ],
+      "path_recoveries": [
+        0
+      ],
+      "noop_fallbacks": [
+        0
+      ],
+      "reset_retries": [
+        0
+      ],
+      "episode_traces": [
+        {
+          "language_goal": "push the olive and the orange buttons",
+          "steps": [
+            {
+              "timestep": 0,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 1,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 2,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 3,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 4,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 5,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 6,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 7,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 8,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 9,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 10,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 11,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 12,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 13,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 14,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 15,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 16,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 17,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 18,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 19,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 20,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 21,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 22,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 23,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            },
+            {
+              "timestep": 24,
+              "chosen_macro_mode": null,
+              "planner_scores": null,
+              "predicted_reocclusion": null,
+              "support_mode_conditioning": true,
+              "path_recoveries": 0,
+              "noop_fallbacks": 0
+            }
+          ],
+          "success": 0.0,
+          "return": 0.0,
+          "path_recoveries": 0,
+          "noop_fallbacks": 0
+        }
+      ],
+      "mean_success": 0.0,
+      "mean_return": 0.0
+    }
+  },
+  "mean_success": 0.0
+}

artifacts/reports/repaired_dual_push_chunk8_ep3/rollout_eval.json ADDED Viewed

	@@ -0,0 +1,29 @@

+{
+  "checkpoint": "/workspace/workspace/VLAarchtests2/outputs/rlbench_dual_push/rlbench_dual_push_backbone_only_clip_chunk8_weighted_seed17/checkpoint_best.pt",
+  "plan_requested": false,
+  "plan_applied": false,
+  "planner_mode": "trainable",
+  "support_mode_conditioning": true,
+  "task_conditioning": true,
+  "geometry_enabled": true,
+  "world_model_mode": "checkpoint_default",
+  "episodes_per_task": 3,
+  "episode_length": 120,
+  "resolution": 256,
+  "reset_retries": 20,
+  "arm_mode": "planning",
+  "delta_scale": 1.0,
+  "cameras": [
+    "front",
+    "wrist_left",
+    "wrist_right"
+  ],
+  "tasks": {
+    "bimanual_dual_push_buttons": {
+      "error": "The call failed on the V-REP side. Return value: -1",
+      "mean_success": 0.0,
+      "mean_return": 0.0
+    }
+  },
+  "mean_success": 0.0
+}

artifacts/reports/repaired_dual_push_chunk8_ep3/rollout_eval.md ADDED Viewed

	@@ -0,0 +1,14 @@

+# RLBench Rollout Eval
+- Checkpoint: `/workspace/workspace/VLAarchtests2/outputs/rlbench_dual_push/rlbench_dual_push_backbone_only_clip_chunk8_weighted_seed17/checkpoint_best.pt`
+- Plan requested: `False`
+- Plan applied: `False`
+- Support-mode conditioning: `True`
+- Task conditioning: `True`
+- Geometry enabled: `True`
+- World-model mode: `checkpoint_default`
+- Mean success: `0.000`
+## Per-task
+- `bimanual_dual_push_buttons`: error=The call failed on the V-REP side. Return value: -1

artifacts/reports/repaired_dual_push_chunk8_ep3/rollout_eval.partial.json ADDED Viewed

	@@ -0,0 +1,29 @@

+{
+  "checkpoint": "/workspace/workspace/VLAarchtests2/outputs/rlbench_dual_push/rlbench_dual_push_backbone_only_clip_chunk8_weighted_seed17/checkpoint_best.pt",
+  "plan_requested": false,
+  "plan_applied": false,
+  "planner_mode": "trainable",
+  "support_mode_conditioning": true,
+  "task_conditioning": true,
+  "geometry_enabled": true,
+  "world_model_mode": "checkpoint_default",
+  "episodes_per_task": 3,
+  "episode_length": 120,
+  "resolution": 256,
+  "reset_retries": 20,
+  "arm_mode": "planning",
+  "delta_scale": 1.0,
+  "cameras": [
+    "front",
+    "wrist_left",
+    "wrist_right"
+  ],
+  "tasks": {
+    "bimanual_dual_push_buttons": {
+      "error": "The call failed on the V-REP side. Return value: -1",
+      "mean_success": 0.0,
+      "mean_return": 0.0
+    }
+  },
+  "mean_success": 0.0
+}

docs/CHANGE_AND_TEST_LOG.md ADDED Viewed

	@@ -0,0 +1,221 @@

+# Change And Test Log
+This file records the main code changes and executed test commands copied into this repo. Result statements below are raw command outcomes only.
+## Previous Repo Work Included Here
+Copied from `history/VLAarchtests_previous_README.md`:
+- core model, memory, planner, and dataset changes under:
+  - `VLAarchtests/code/reveal_vla_bimanual/models/`
+  - `VLAarchtests/code/reveal_vla_bimanual/train/losses.py`
+  - `VLAarchtests/code/reveal_vla_bimanual/sim_reveal/`
+  - `VLAarchtests/code/reveal_vla_bimanual/sim_rlbench/dataset.py`
+- training and eval paths under:
+  - `VLAarchtests/code/reveal_vla_bimanual/train/`
+  - `VLAarchtests/code/reveal_vla_bimanual/eval/`
+- earlier test suite under:
+  - `VLAarchtests/tests/`
+## Current Session File Changes
+### Core reveal/proxy path
+- `VLAarchtests/code/reveal_vla_bimanual/models/policy.py`
+- `VLAarchtests/code/reveal_vla_bimanual/models/action_decoder.py`
+- `VLAarchtests/code/reveal_vla_bimanual/models/backbones.py`
+- `VLAarchtests/code/reveal_vla_bimanual/models/rvt_backbone.py`
+- `VLAarchtests/code/reveal_vla_bimanual/train/losses.py`
+- `VLAarchtests/code/reveal_vla_bimanual/train/run_rlbench_experiment.py`
+- `VLAarchtests/code/reveal_vla_bimanual/eval/run_reveal_benchmark.py`
+- `VLAarchtests/code/reveal_vla_bimanual/eval/summarize_anybimanual_overlap_eval.py`
+- `VLAarchtests/code/reveal_vla_bimanual/eval/summarize_rvt_overlap_branch.py`
+- `VLAarchtests/code/reveal_vla_bimanual/eval/compose_task_routed_proxy_summary.py`
+- `VLAarchtests/code/reveal_vla_bimanual/eval/run_proposal_alignment_diagnostics.py`
+- `VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_knn_eval.py`
+- `VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_knn_task_sweep.py`
+- `VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_dual_push_retargeted_demo_eval.py`
+- `VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_dual_push_full_arch_hybrid_eval.py`
+- `VLAarchtests/code/reveal_vla_bimanual/eval/dual_push_retarget_utils.py`
+- `VLAarchtests/code/reveal_vla_bimanual/eval/dual_push_full_arch_utils.py`
+- `VLAarchtests/code/reveal_vla_bimanual/sim_reveal/build_task_specialized_episode_specs.py`
+- `VLAarchtests/code/reveal_vla_bimanual/sim_reveal/procedural_envs.py`
+- `VLAarchtests/code/reveal_vla_bimanual/sim_rlbench/task_resolver.py`
+### Training/eval wrappers and configs
+- `VLAarchtests/code/reveal_vla_bimanual/scripts/run_task_routed_proxy_eval.sh`
+- `VLAarchtests/code/reveal_vla_bimanual/scripts/run_bag_selector_iter9.sh`
+- `VLAarchtests/code/reveal_vla_bimanual/scripts/run_anybimanual_subset3_overlap_train.sh`
+- `VLAarchtests/code/reveal_vla_bimanual/scripts/run_anybimanual_subset3_overlap_eval.sh`
+- `VLAarchtests/code/reveal_vla_bimanual/scripts/run_rvt_overlap_branch.sh`
+- `VLAarchtests/code/reveal_vla_bimanual/scripts/run_dual_push_retargeted_demo_eval.sh`
+- `VLAarchtests/code/reveal_vla_bimanual/scripts/run_dual_push_full_arch_hybrid_eval.sh`
+- `VLAarchtests/code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_v7_selector_finetune_iter6.yaml`
+- `VLAarchtests/code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_v7_selector_finetune_iter7.yaml`
+- `VLAarchtests/code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_v7_selector_finetune_iter8.yaml`
+- `VLAarchtests/code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_v7_selector_finetune_iter9_bag.yaml`
+- `VLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_subset3_backbone_only_clip_100demo_fair_step1_full.yaml`
+- `VLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_subset3_backbone_only_rvt_100demo_frozen_seed17.yaml`
+- `VLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_subset3_backbone_only_rvt_100demo_unfreeze_top2_seed17.yaml`
+- `VLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_dual_push_backbone_only_clip_finetune_seed17.yaml`
+- `VLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_dual_push_backbone_only_clip_finetune_weighted_seed17.yaml`
+- `VLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_dual_push_backbone_only_clip_chunk8_weighted_seed17.yaml`
+- `environment/reconstruct_anybimanual_overlap_replay.sh`
+### Test additions or updates
+- `VLAarchtests/tests/test_eval_toggle_paths_work.py`
+- `VLAarchtests/tests/test_task_routed_model_eval.py`
+- `VLAarchtests/tests/test_anybimanual_resume_logic.py`
+- `VLAarchtests/tests/test_anybimanual_overlap_eval_summary.py`
+- `VLAarchtests/tests/test_candidate_ranking_loss.py`
+- `VLAarchtests/tests/test_compose_task_routed_proxy_summary.py`
+- `VLAarchtests/tests/test_build_task_specialized_episode_specs.py`
+- `VLAarchtests/tests/test_proposal_mode_names_label_base_action.py`
+- `VLAarchtests/tests/test_proxy_scripted_bench.py`
+- `VLAarchtests/tests/test_rvt_backbone_forward.py`
+- `VLAarchtests/tests/test_rlbench_dataset_rgbd_geometry.py`
+- `VLAarchtests/tests/test_rlbench_init_checkpoint.py`
+- `VLAarchtests/tests/test_rlbench_pickle_bootstrap.py`
+- `VLAarchtests/tests/test_rlbench_task_resolver_aliases.py`
+- `VLAarchtests/tests/test_summarize_rvt_overlap_branch.py`
+- `VLAarchtests/tests/test_dual_push_retarget_utils.py`
+- `VLAarchtests/tests/test_dual_push_full_arch_utils.py`
+### Third-party baseline path changes
+- `third_party/AnyBimanual/third_party/YARR/yarr/runners/offline_train_runner.py`
+- `third_party/AnyBimanual/third_party/YARR/yarr/runners/weight_init_utils.py`
+- `third_party/AnyBimanual/agents/peract_bc/launch_utils.py`
+- `third_party/AnyBimanual/agents/peract_bc/qattention_peract_bc_agent.py`
+- `third_party/AnyBimanual/agents/peract_bimanual/qattention_peract_bc_agent.py`
+## Current Session Test Commands
+Executed commands recorded in the workspace:
+- `python -m py_compile /workspace/VLAarchtests/code/reveal_vla_bimanual/models/action_decoder.py /workspace/VLAarchtests/tests/test_proposal_mode_names_label_base_action.py`
+- `PYTHONPATH=/workspace/VLAarchtests/code/reveal_vla_bimanual pytest -q /workspace/VLAarchtests/tests/test_proposal_mode_names_label_base_action.py /workspace/VLAarchtests/tests/test_candidate_ranking_loss.py /workspace/VLAarchtests/tests/test_compose_task_routed_proxy_summary.py /workspace/VLAarchtests/tests/test_build_task_specialized_episode_specs.py`
+  - result: `11 passed`
+- `pytest -q /workspace/VLAarchtests/tests/test_anybimanual_overlap_eval_summary.py`
+  - result: `2 passed`
+- `pytest -q /workspace/VLAarchtests/tests/test_task_routed_model_eval.py /workspace/VLAarchtests/tests/test_eval_toggle_paths_work.py`
+  - result: `4 passed`
+- `pytest -q /workspace/VLAarchtests/tests/test_rvt_backbone_forward.py /workspace/VLAarchtests/tests/test_rlbench_dataset_rgbd_geometry.py /workspace/VLAarchtests/tests/test_eval_toggle_paths_work.py /workspace/VLAarchtests/tests/test_rlbench_init_checkpoint.py /workspace/VLAarchtests/tests/test_rlbench_pickle_bootstrap.py /workspace/VLAarchtests/tests/test_rlbench_task_resolver_aliases.py /workspace/VLAarchtests/tests/test_summarize_rvt_overlap_branch.py`
+  - result: `passed`
+- `pytest -q /workspace/VLAarchtests/tests/test_build_task_specialized_episode_specs.py /workspace/VLAarchtests/tests/test_candidate_ranking_loss.py /workspace/VLAarchtests/tests/test_compose_task_routed_proxy_summary.py`
+  - result: `10 passed`
+- `pytest -q /workspace/VLAarchtests/tests/test_dual_push_retarget_utils.py /workspace/VLAarchtests/tests/test_rlbench_knn_eval_scene_kwargs.py`
+  - result: `passed`
+- `pytest -q /workspace/VLAarchtests/tests/test_dual_push_retarget_utils.py`
+  - result: `6 passed`
+- `pytest -q /workspace/VLAarchtests/tests/test_dual_push_retarget_utils.py /workspace/VLAarchtests/tests/test_dual_push_full_arch_utils.py`
+  - result: `9 passed`
+- `bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_bag_selector_iter9.sh`
+- `bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_task_routed_proxy_eval.sh`
+- `bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_anybimanual_subset3_overlap_train.sh`
+- `bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_rvt_overlap_branch.sh`
+- `bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_dual_push_retargeted_demo_eval.sh`
+- `bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_dual_push_full_arch_hybrid_eval.sh`
+- `PYTHONPATH=/workspace/third_party/AnyBimanual/third_party/YARR pytest -q /workspace/VLAarchtests/tests/test_anybimanual_resume_logic.py`
+  - result: `4 passed`
+- `python -m py_compile /workspace/VLAarchtests/code/reveal_vla_bimanual/models/rvt_backbone.py /workspace/VLAarchtests/code/reveal_vla_bimanual/train/run_rlbench_experiment.py /workspace/VLAarchtests/code/reveal_vla_bimanual/sim_rlbench/dataset.py /workspace/VLAarchtests/code/reveal_vla_bimanual/sim_rlbench/task_resolver.py /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/summarize_rvt_overlap_branch.py`
+  - result: `passed`
+- `python -m py_compile /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/dual_push_retarget_utils.py /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_dual_push_retargeted_demo_eval.py`
+  - result: `passed`
+- `python -m py_compile /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/dual_push_retarget_utils.py /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/dual_push_full_arch_utils.py /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_dual_push_full_arch_hybrid_eval.py`
+  - result: `passed`
+## Current Session Generated Reports
+Current-session report roots staged in this repo:
+- `VLAarchtests/artifacts/reports/sprint_v7_summary/`
+- `VLAarchtests/artifacts/reports/sprint_v7_followup/`
+- `VLAarchtests/artifacts/reports/selector_finetune_v7_iterations/`
+- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter6/`
+- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter7/`
+- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter8/`
+- `VLAarchtests/artifacts/reports/task_routed_proxy_v1/`
+- `VLAarchtests/artifacts/reports/rlbench_general_debug_20260330/`
+- `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/`
+- `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/resume1000_summary/`
+- `VLAarchtests/artifacts/reports/bag_mode_specialization_20260330/`
+- `VLAarchtests/artifacts/reports/general_task_anchor_20260330_dual_push_buttons/`
+- `VLAarchtests/artifacts/reports/rvt_overlap_branch_20260330/`
+- `VLAarchtests/artifacts/reports/rvt_overlap_branch_fixedbounds_20260330/`
+- `VLAarchtests/artifacts/reports/dual_push_nonzero_branch_20260330/`
+- `VLAarchtests/artifacts/reports/dual_push_full_arch_hybrid_20260331/`
+## HF Packaging Notes
+Raw packaging changes applied to the staged HF export:
+- `baselines/AnyBimanual_overlap_replay/multi/` was reshaped from one flat directory into shard subdirectories:
+  - `00000-04999/`
+  - `05000-09999/`
+  - `10000-14999/`
+- file count after reshape: `14034`
+- reconstruction helper added at:
+  - `environment/reconstruct_anybimanual_overlap_replay.sh`
+- exact rejected Hub error before reshape:
+  - `Your push was rejected because it contains too many files per directory. Each directory in your git repo can only contain up to 10000 files. Offending directories: /baselines/AnyBimanual_overlap_replay/multi/`
+## Current Session Logs
+Main logs staged in this repo:
+- `reports/anybimanual_subset3_overlap_smoke200_fixpretrain_nowandb3_train.log`
+- `reports/anybimanual_subset3_overlap_smoke200_fixpretrain_nowandb3_train_presavefix.log`
+- `reports/anybimanual_subset3_overlap_resume1000_eval.log`
+- `reports/anybimanual_subset3_overlap_resume1000_summary.log`
+- `reports/task_routed_proxy_v1_rerun.log`
+- `reports/run_bag_selector_iter9_prebuild.log`
+- `reports/anybimanual_release_subset3_eval_ep5.log`
+- `reports/rvt_overlap_branch_fixedbounds_20260330_chain.sh`
+- `reports/dual_push_full_arch_hybrid_iter6_scene_ep5.log`
+- `reports/dual_push_full_arch_hybrid_iter6_backbone_ep2_r005.log`
+## Official Overlap Eval Final Raw Outputs
+Sources:
+- `reports/anybimanual_subset3_overlap_resume1000_eval.log`
+- `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/resume1000_summary/summary.json`
+Raw values:
+- step `1000`
+- local mean success `0.16`
+- `coordinated_push_box`: success `0.0`, return `0.0`
+- `coordinated_lift_ball`: success `0.0`, return `0.0`
+- `dual_push_buttons`: success `0.48`, return `12.0`
+## General-Task Anchor Raw Outputs
+Sources:
+- `VLAarchtests/artifacts/reports/general_task_anchor_20260330_dual_push_buttons/summary.json`
+Raw values:
+- public AnyBimanual release, step `60000`: success `0.96`, return `24.0`, length `21.56`
+- local official single-task eval, step `60000`, `25` episodes: success `0.96`, return `24.0`, length `21.84`
+- local clip backbone-only result: success `0.0`, return `0.0`
+- local elastic reveal proxy iter6 result: success `0.0`, return `0.0`
+- local RVT frozen fixed-bounds result: success `0.0`, return `0.0`
+## Dual-Push Branch Raw Outputs
+Sources:
+- `VLAarchtests/artifacts/reports/dual_push_nonzero_branch_20260330/summary.md`
+- `VLAarchtests/artifacts/reports/dual_push_full_arch_hybrid_20260331/summary.md`
+Raw values:
+- demo replay through `absolute_action_from_delta`: mean success `0.8`, mean return `0.8`
+- retargeted demo with checkpoint backbone retrieval and vision-only button localization, `5` episodes: mean success `1.0`, mean return `1.0`
+- elastic checkpoint retargeted-demo probe with scene retrieval and vision-only button localization, `1` episode: mean success `1.0`, mean return `1.0`
+- full-architecture hybrid eval with elastic controller checkpoint plus dual-push retrieval checkpoint, `1` episode: mean success `1.0`, mean return `1.0`, steps `116`, path recoveries `0`, noop fallbacks `0`

docs/MODEL_AND_ARTIFACT_INDEX.md ADDED Viewed

	@@ -0,0 +1,59 @@

+# Model And Artifact Index
+Main staged roots:
+- `VLAarchtests/code/reveal_vla_bimanual/`
+- `VLAarchtests/tests/`
+- `VLAarchtests/artifacts/`
+- `third_party/AnyBimanual/`
+- `baselines/`
+- `outputs/`
+- `reports/`
+- `handoff/instructions4.md`
+- `history/VLAarchtests_previous_README.md`
+Key current-session report roots:
+- `VLAarchtests/artifacts/reports/sprint_v7_summary/`
+- `VLAarchtests/artifacts/reports/sprint_v7_followup/`
+- `VLAarchtests/artifacts/reports/selector_finetune_v7_iterations/`
+- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter6/`
+- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter7/`
+- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter8/`
+- `VLAarchtests/artifacts/reports/task_routed_proxy_v1/`
+- `VLAarchtests/artifacts/reports/rlbench_general_debug_20260330/`
+- `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/`
+- `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/resume1000_summary/`
+- `VLAarchtests/artifacts/reports/bag_mode_specialization_20260330/`
+- `VLAarchtests/artifacts/reports/general_task_anchor_20260330_dual_push_buttons/`
+- `VLAarchtests/artifacts/reports/rvt_overlap_branch_20260330/`
+- `VLAarchtests/artifacts/reports/rvt_overlap_branch_fixedbounds_20260330/`
+- `VLAarchtests/artifacts/reports/dual_push_nonzero_branch_20260330/`
+- `VLAarchtests/artifacts/reports/dual_push_full_arch_hybrid_20260331/`
+Key current-session run/log roots:
+- `baselines/AnyBimanual_overlap_runs/peract_bc_subset3_overlap_smoke200_fixpretrain_nowandb3/`
+- `baselines/AnyBimanual_release_eval_anchor/perlf_release_dual_push_buttons_ep25/`
+- `baselines/AnyBimanual_overlap_replay/`
+- `outputs/rlbench_true_baselines/`
+- `outputs/rlbench_dual_push/`
+- `outputs/rlbench_rvt_branch/`
+- `reports/anybimanual_subset3_overlap_resume1000_eval.log`
+- `reports/anybimanual_subset3_overlap_resume1000_summary.log`
+- `reports/anybimanual_release_subset3_eval_ep5.log`
+- `reports/dual_push_full_arch_probe_iter6_scene_ep1/`
+- `reports/dual_push_full_arch_hybrid_iter6_backbone_ep1/`
+- `reports/dual_push_nonzero_branch_20260330/`
+- `reports/run_bag_selector_iter9_prebuild.log`
+- `reports/task_routed_proxy_v1_rerun.log`
+- `environment/reconstruct_anybimanual_overlap_replay.sh`
+Key final official overlap summary files:
+- `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/resume1000_summary/summary.md`
+- `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/resume1000_summary/summary.json`
+HF export packaging note:
+- `baselines/AnyBimanual_overlap_replay/multi/` is sharded into subdirectories in this repo copy.

docs/RESULTS_RAW.md ADDED Viewed

	@@ -0,0 +1,178 @@

+# Results Raw
+This file records exact values and exact partial statuses without additional conclusions.
+## Proxy Sprint v7 Main Table
+Source:
+- `VLAarchtests/artifacts/reports/sprint_v7_summary/reveal_sprint_summary_compact.json`
+| Item | Raw values |
+| --- | --- |
+| base_model | mean success `0.28`; foliage `0.39`; bag `0.31`; cloth `0.14` |
+| random | mean success `0.43333333333333335`; foliage `0.41`; bag `0.37`; cloth `0.52` |
+| candidate0 | mean success `0.2`; foliage `0.24`; bag `0.22`; cloth `0.14` |
+| oracle | mean success `0.4066666666666667`; foliage `0.5`; bag `0.42`; cloth `0.3` |
+| scripted | mean success `1.0`; foliage `1.0`; bag `1.0`; cloth `1.0` |
+## Proxy Sprint v7 Ablation Table
+Source:
+- `VLAarchtests/artifacts/reports/sprint_v7_summary/reveal_sprint_summary_compact.json`
+| Item | Raw values |
+| --- | --- |
+| no_planner | `0.2` |
+| no_memory | `0.3233333333333333` |
+| no_task_conditioning | `0.28` |
+| no_geometry | `0.27` |
+| no_camera_pose | `0.29333333333333333` |
+## Selector Table
+Sources:
+- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter6/default/reveal_benchmark.json`
+- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter7/full_fixed_default/reveal_benchmark.json`
+- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter8/bag_fixed_default/reveal_benchmark.json`
+- `VLAarchtests/artifacts/reports/task_routed_proxy_v1/summary.md`
+| Item | Raw values |
+| --- | --- |
+| iter6 | mean success `0.4566666666666667`; foliage `0.46`; bag `0.4`; cloth `0.51` |
+| iter7 | mean success `0.4666666666666666`; foliage `0.4`; bag `0.41`; cloth `0.59` |
+| iter8 bag fixed slice | mean success `0.41`; nominal `0.45`; high_reocclusion `0.4`; camera_perturbation `0.5`; one_sided_slip `0.25` |
+| routed controller | mean success `0.48666666666666664`; route `foliage -> iter6`, `bag -> iter8`, `cloth -> iter8`; foliage `0.46`; bag `0.41`; cloth `0.59` |
+## Proxy Baseline Compare Table
+Source:
+- `VLAarchtests/artifacts/reports/real_baseline_compare_v7_full/reveal_benchmark.json`
+| Item | Raw values |
+| --- | --- |
+| baseline_rgbd_stage3 | mean success `0.31`; foliage `0.21`; bag `0.15`; cloth `0.57` |
+| iter5_selector | mean success `0.45`; foliage `0.44`; bag `0.4`; cloth `0.51` |
+## RLBench Recovered Push-Box Comparator
+Sources:
+- `reports/rlbench_general_debug/rlbench_push_box_fair_step1_final_knn_ep10_x99_res224_len180_train80_fixed/bimanual_push_box/rollout_eval.json`
+- `reports/rlbench_general_debug/rlbench_push_box_historical_step1_knn_ep10_x99_res224_len180_train80_fixed/bimanual_push_box/rollout_eval.json`
+| Item | Raw values |
+| --- | --- |
+| current fair-step1 final | mean success `0.7`; mean return `0.7`; successes `[1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0]` |
+| historical push-box control | mean success `0.4`; mean return `0.4`; successes `[0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0]` |
+## Official AnyBimanual Overlap Training Milestones
+Sources:
+- `baselines/AnyBimanual_overlap_runs/peract_bc_subset3_overlap_smoke200_fixpretrain_nowandb3/PERACT_BC/seed0/training.log`
+- `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/status.md`
+| Global step | Raw values |
+| --- | --- |
+| 300 | loss `40.91718`; sample time `0.093029`; step time `14.0686` |
+| 400 | loss `33.26684`; sample time `0.073085`; step time `14.3032` |
+| 500 | loss `36.07054`; sample time `0.048558`; step time `11.1376` |
+| 600 | loss `35.32345`; sample time `0.040642`; step time `9.7719` |
+| 700 | loss `28.50959`; sample time `0.057937`; step time `10.9347` |
+| 800 | loss `23.60169`; sample time `0.032697`; step time `11.8652` |
+| 900 | loss `15.28901`; sample time `0.051232`; step time `11.5073` |
+| 1000 checkpoint | train reached `weights/1000` and exited cleanly |
+## Official AnyBimanual Overlap Eval Final Output
+Sources:
+- `reports/anybimanual_subset3_overlap_resume1000_eval.log`
+- `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/resume1000_summary/summary.json`
+| Item | Raw values |
+| --- | --- |
+| local last complete step | `1000` |
+| local mean success | `0.16` |
+| coordinated_push_box | success `0.0`; return `0.0`; final score log line `0.0` |
+| coordinated_lift_ball | success `0.0`; return `0.0`; final score log line `0.0` |
+| dual_push_buttons | success `0.48`; return `12.0`; final score log line `12.0` |
+| public best overlap step in local summary | step `60000`; mean success `0.6933333333333334` |
+| public best overlap per-task success | coordinated_push_box `0.8`; coordinated_lift_ball `0.32`; dual_push_buttons `0.96` |
+| delta vs public best mean success | `-0.5333333333333333` |
+| delta vs public best per-task success | coordinated_push_box `-0.8`; coordinated_lift_ball `-0.32`; dual_push_buttons `-0.48` |
+## Validated General-Task Anchor: dual_push_buttons
+Source:
+- `VLAarchtests/artifacts/reports/general_task_anchor_20260330_dual_push_buttons/summary.json`
+| Item | Raw values |
+| --- | --- |
+| public AnyBimanual release | step `60000`; success `0.96`; return `24.0`; length `21.56` |
+| local official single-task eval | step `60000`; episodes `25`; success `0.96`; return `24.0`; length `21.84` |
+| local clip backbone-only | success `0.0`; return `0.0`; path `reports/true_baseline_compare_subset3_v1/rlbench_subset3_backbone_only_clip_100demo_fair_seed17_noplan_split/bimanual_dual_push_buttons/rollout_eval.json` |
+| local elastic reveal proxy iter6 | success `0.0`; return `0.0`; path `reports/true_baseline_compare_subset3_v1/rlbench_subset3_elastic_reveal_proxy_iter6_100demo_fair_seed17_noplan_split/bimanual_dual_push_buttons/rollout_eval.json` |
+| local RVT hybrid frozen fixed-bounds | success `0.0`; return `0.0`; path `reports/rvt_overlap_branch_fixedbounds_20260330/evals/rlbench_subset3_backbone_only_rvt_100demo_frozen_fixedbounds_seed17_noplan_split/bimanual_dual_push_buttons/rollout_eval.json` |
+## RVT Overlap Branch
+Sources:
+- `VLAarchtests/artifacts/reports/rvt_overlap_branch_20260330/status.md`
+- `VLAarchtests/artifacts/reports/rvt_overlap_branch_20260330/summary.md`
+- `VLAarchtests/artifacts/reports/rvt_overlap_branch_fixedbounds_20260330/summary.md`
+| Item | Raw values |
+| --- | --- |
+| frozen RVT stage1 train | checkpoint `outputs/rlbench_rvt_branch/rlbench_subset3_backbone_only_rvt_100demo_frozen_seed17/checkpoint_best.pt`; final train total `0.043179353826920445`; final val total `0.039591669984665984`; train seconds `2261.2839448451996` |
+| frozen RVT overlap eval | mean success `0.0`; push_box `0.0`; lift_ball `0.0`; dual_push_buttons `0.0` |
+| frozen fixed-bounds RVT overlap eval | mean success `0.0`; push_box `0.0`; lift_ball `0.0`; dual_push_buttons `0.0` |
+| local overlap floor used for gate | `0.16` |
+| stage2 run flag | `false` |
+## Dual-Push Nonzero Branch
+Source:
+- `VLAarchtests/artifacts/reports/dual_push_nonzero_branch_20260330/summary.md`
+| Item | Raw values |
+| --- | --- |
+| direct rollout smoke planning | `5` episodes; `25` steps; mean success `0.0`; path `reports/dual_push_nonzero_branch_20260330/smoke_planning/rollout_eval.json` |
+| controller sweep planning_c4 | `0.0` |
+| controller sweep ik_c1 | `0.0` |
+| controller sweep planning_c1_s05 | `0.0` |
+| kNN top-1 planning | `5` episodes; `25` steps; mean success `0.0` |
+| weighted rollout smoke planning | `5` episodes; `25` steps; mean success `0.0` |
+| demo replay through absolute_action_from_delta | mean success `0.8`; mean return `0.8`; successful demo step counts `89`, `112`, `93`, `112` |
+| weighted kNN top-1 planning length120 | `2` episodes; mean success `0.0` |
+| chunk8 probe IK length120 | `1` episode; success `0.0`; return `0.0`; path recoveries `119`; noop fallbacks `1` |
+| retargeted demo task_state smoke | `2` episodes; mean success `1.0`; mean return `1.0` |
+| retargeted demo checkpoint-backbone ep5 | `5` episodes; mean success `1.0`; mean return `1.0` |
+| retargeted demo checkpoint-backbone vision ep1 | `1` episode; mean success `1.0`; mean return `1.0` |
+| retargeted demo checkpoint-backbone vision ep5 | `5` episodes; mean success `1.0`; mean return `1.0` |
+## Dual-Push Full-Architecture Hybrid
+Sources:
+- `VLAarchtests/artifacts/reports/dual_push_full_arch_hybrid_20260331/summary.md`
+- `reports/dual_push_full_arch_probe_iter6_scene_ep1/summary.json`
+- `reports/dual_push_full_arch_hybrid_iter6_backbone_ep1/summary.json`
+| Item | Raw values |
+| --- | --- |
+| elastic checkpoint retargeted-demo probe | `1` episode; mean success `1.0`; mean return `1.0`; steps `94`; retrieved episode index `11`; retrieval similarity `0.9998629689216614` |
+| full-architecture hybrid eval | `1` episode; mean success `1.0`; mean return `1.0`; steps `116`; path recoveries `0`; noop fallbacks `0`; first selected mode `residual::maintain_opening`; last selected mode `residual::base_action` |
+## Previous Repo Raw Results
+Previous raw tables are preserved in:
+- `history/VLAarchtests_previous_README.md`

docs/VLAarchtests2_code_README.md ADDED Viewed

	@@ -0,0 +1,301 @@

+# VLAarchtests2
+Bundle staged from `/workspace` on `2026-03-31 UTC`.
+This repo is the follow-on organization repo to `lsnu/VLAarchtests`. It includes:
+- current code under `VLAarchtests/`
+- current third-party baseline code under `third_party/`
+- current baseline runs, replay artifacts, demo roots, and released checkpoint material under `baselines/`
+- current training outputs and checkpoints under `outputs/`
+- current logs under `reports/`
+- environment recreation files under `environment/`
+- raw results and change/test logs at the repo root
+- the previous repo README under `history/VLAarchtests_previous_README.md`
+- the active handoff file under `handoff/instructions4.md`
+## Top-Level Contents
+- `VLAarchtests/`
+  - code, tests, configs, generated configs, reports, checkpoints, and proxy datasets from the current runpod workspace
+- `third_party/AnyBimanual/`
+  - local AnyBimanual checkout used for the official overlap baseline branch, including local compatibility patches
+- `baselines/`
+  - released AnyBimanual checkpoint material
+  - overlap replay artifacts
+    - HF export packaging note: `baselines/AnyBimanual_overlap_replay/multi/` is sharded into subdirectories to satisfy the Hub `10000 files per directory` limit
+  - overlap run directories
+  - local subset3 demo roots used by the overlap branch
+- `outputs/`
+  - RLBench training outputs and checkpoints used by the current anchor, RVT, dual-push, and elastic-controller branches
+- `reports/`
+  - training and evaluation logs copied from `/workspace/reports`
+- `environment/`
+  - machine snapshot, package lists, and setup helpers
+- `history/`
+  - copied previous-repo README
+- `handoff/`
+  - active sprint instruction file
+- `RESULTS_RAW.md`
+  - raw result tables and final official overlap eval outputs
+- `CHANGE_AND_TEST_LOG.md`
+  - file-level change log and executed test commands
+- `MODEL_AND_ARTIFACT_INDEX.md`
+  - staged directory map with main artifact roots
+## Previous Repo Coverage
+The earlier `lsnu/VLAarchtests` repo covered the `2026-03-25/26` work. Its README is copied verbatim at:
+- `history/VLAarchtests_previous_README.md`
+Previous-repo items explicitly referenced there include:
+- compact, spatial, compact-phase, and spatial-phase proxy branches
+- earlier RLBench direct-policy and kNN runs
+- environment recreation files
+- prior raw result tables
+## Current Session Additions
+Current-session folders added or expanded in this repo include:
+- `VLAarchtests/artifacts/reports/sprint_v7_summary/`
+- `VLAarchtests/artifacts/reports/sprint_v7_followup/`
+- `VLAarchtests/artifacts/reports/selector_finetune_v7_iterations/`
+- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter6/`
+- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter7/`
+- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter8/`
+- `VLAarchtests/artifacts/reports/task_routed_proxy_v1/`
+- `VLAarchtests/artifacts/reports/rlbench_general_debug_20260330/`
+- `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/`
+- `VLAarchtests/artifacts/reports/bag_mode_specialization_20260330/`
+- `VLAarchtests/artifacts/reports/general_task_anchor_20260330_dual_push_buttons/`
+- `VLAarchtests/artifacts/reports/rvt_overlap_branch_20260330/`
+- `VLAarchtests/artifacts/reports/rvt_overlap_branch_fixedbounds_20260330/`
+- `VLAarchtests/artifacts/reports/dual_push_nonzero_branch_20260330/`
+- `VLAarchtests/artifacts/reports/dual_push_full_arch_hybrid_20260331/`
+## Raw Results Snapshot
+### Proxy sprint v7
+Source:
+- `VLAarchtests/artifacts/reports/sprint_v7_summary/reveal_sprint_summary_compact.json`
+Raw values:
+- base model mean success: `0.28`
+- base per-task: foliage `0.39`, bag `0.31`, cloth `0.14`
+- random mean success: `0.43333333333333335`
+- candidate0 mean success: `0.2`
+- oracle mean success: `0.4066666666666667`
+- scripted mean success: `1.0`
+### Eval-time ablations
+Source:
+- `VLAarchtests/artifacts/reports/sprint_v7_summary/reveal_sprint_summary_compact.json`
+Raw values:
+- `no_planner`: `0.2`
+- `no_memory`: `0.3233333333333333`
+- `no_task_conditioning`: `0.28`
+- `no_geometry`: `0.27`
+- `no_camera_pose`: `0.29333333333333333`
+### Selector checkpoints
+Sources:
+- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter6/default/reveal_benchmark.json`
+- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter7/full_fixed_default/reveal_benchmark.json`
+- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter8/bag_fixed_default/reveal_benchmark.json`
+- `VLAarchtests/artifacts/reports/task_routed_proxy_v1/summary.md`
+Raw values:
+- `iter6` mean success: `0.4566666666666667`
+  - foliage `0.46`, bag `0.4`, cloth `0.51`
+- `iter7` mean success: `0.4666666666666666`
+  - foliage `0.4`, bag `0.41`, cloth `0.59`
+- `iter8` bag-only fixed slice: `0.41`
+- routed controller mean success: `0.48666666666666664`
+  - routing rule: `foliage -> iter6`, `bag -> iter8`, `cloth -> iter8`
+  - per-task: foliage `0.46`, bag `0.41`, cloth `0.59`
+### Real baseline compare on proxy suite
+Source:
+- `VLAarchtests/artifacts/reports/real_baseline_compare_v7_full/reveal_benchmark.json`
+Raw values:
+- `baseline_rgbd_stage3` mean success: `0.31`
+  - foliage `0.21`, bag `0.15`, cloth `0.57`
+- `iter5_selector` mean success: `0.45`
+  - foliage `0.44`, bag `0.4`, cloth `0.51`
+### RLBench recovered push-box comparator
+Sources:
+- `reports/rlbench_general_debug/rlbench_push_box_fair_step1_final_knn_ep10_x99_res224_len180_train80_fixed/bimanual_push_box/rollout_eval.json`
+- `reports/rlbench_general_debug/rlbench_push_box_historical_step1_knn_ep10_x99_res224_len180_train80_fixed/bimanual_push_box/rollout_eval.json`
+Raw values:
+- current fair-step1 final mean success: `0.7`
+- current fair-step1 final successes:
+  - `[1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0]`
+- historical push-box control mean success: `0.4`
+- historical push-box control successes:
+  - `[0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0]`
+### Official AnyBimanual overlap branch
+Sources:
+- `baselines/AnyBimanual_overlap_runs/peract_bc_subset3_overlap_smoke200_fixpretrain_nowandb3/PERACT_BC/seed0/training.log`
+- `reports/anybimanual_subset3_overlap_resume1000_eval.log`
+Raw train milestones:
+- global step `300`: loss `40.91718`
+- global step `400`: loss `33.26684`
+- global step `500`: loss `36.07054`
+- global step `600`: loss `35.32345`
+- global step `700`: loss `28.50959`
+- global step `800`: loss `23.60169`
+- global step `900`: loss `15.28901`
+- run reached `weights/1000` and the train exited cleanly
+Raw eval outputs:
+- source log: `reports/anybimanual_subset3_overlap_resume1000_eval.log`
+- summary files:
+  - `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/resume1000_summary/summary.md`
+  - `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/resume1000_summary/summary.json`
+- local last complete step: `1000`
+- local mean success: `0.16`
+- local per-task success:
+  - `coordinated_push_box`: `0.0`
+  - `coordinated_lift_ball`: `0.0`
+  - `dual_push_buttons`: `0.48`
+- local per-task return:
+  - `coordinated_push_box`: `0.0`
+  - `coordinated_lift_ball`: `0.0`
+  - `dual_push_buttons`: `12.0`
+- public best overlap step in the local summary: `60000`
+- public best mean success in the local summary: `0.6933333333333334`
+### Validated general-task anchor: `dual_push_buttons`
+Sources:
+- `VLAarchtests/artifacts/reports/general_task_anchor_20260330_dual_push_buttons/summary.json`
+- `baselines/AnyBimanual_release_eval_anchor/perlf_release_dual_push_buttons_ep25/PERACT_BC/seed0/eval_data.csv`
+Raw values:
+- public AnyBimanual release, step `60000`: success `0.96`, return `24.0`, length `21.56`
+- local official single-task eval, step `60000`, `25` episodes: success `0.96`, return `24.0`, length `21.84`
+- local clip backbone-only result on same task: success `0.0`, return `0.0`
+- local elastic reveal proxy iter6 result on same task: success `0.0`, return `0.0`
+- local RVT frozen fixed-bounds result on same task: success `0.0`, return `0.0`
+### RVT overlap branch
+Sources:
+- `VLAarchtests/artifacts/reports/rvt_overlap_branch_20260330/summary.md`
+- `VLAarchtests/artifacts/reports/rvt_overlap_branch_fixedbounds_20260330/summary.md`
+Raw values:
+- frozen RVT stage1 train summary:
+  - `outputs/rlbench_rvt_branch/rlbench_subset3_backbone_only_rvt_100demo_frozen_seed17/summary.json`
+  - final train total `0.043179353826920445`
+  - final val total `0.039591669984665984`
+- frozen RVT overlap eval: mean success `0.0`
+- frozen fixed-bounds RVT overlap eval: mean success `0.0`
+- both branch gates:
+  - local AnyBimanual overlap floor `0.16`
+  - stage2 run `false`
+### Dual-push non-privileged retarget branch
+Sources:
+- `VLAarchtests/artifacts/reports/dual_push_nonzero_branch_20260330/summary.md`
+Raw values:
+- demo replay through `absolute_action_from_delta`:
+  - `reports/dual_push_nonzero_branch_20260330/demo_replay/replay_summary.json`
+  - mean success `0.8`
+  - mean return `0.8`
+- retargeted demo with checkpoint backbone retrieval and vision-only button localization:
+  - `reports/dual_push_nonzero_branch_20260330/retargeted_demo_backbone_vision_ep1/summary.json`
+  - mean success `1.0`
+  - mean return `1.0`
+- retargeted demo with checkpoint backbone retrieval and vision-only button localization:
+  - `reports/dual_push_nonzero_branch_20260330/retargeted_demo_backbone_vision_ep5/summary.json`
+  - mean success `1.0`
+  - mean return `1.0`
+### Dual-push full-architecture hybrid branch
+Sources:
+- `VLAarchtests/artifacts/reports/dual_push_full_arch_hybrid_20260331/summary.md`
+- `reports/dual_push_full_arch_probe_iter6_scene_ep1/summary.json`
+- `reports/dual_push_full_arch_hybrid_iter6_backbone_ep1/summary.json`
+Raw values:
+- elastic checkpoint retargeted-demo probe with scene retrieval and vision-only button localization:
+  - `1` episode
+  - mean success `1.0`
+  - mean return `1.0`
+  - steps `94`
+  - retrieved episode index `11`
+  - retrieval similarity `0.9998629689216614`
+- full-architecture hybrid eval with elastic controller checkpoint plus dual-push retrieval checkpoint:
+  - `1` episode
+  - mean success `1.0`
+  - mean return `1.0`
+  - steps `116`
+  - path recoveries `0`
+  - noop fallbacks `0`
+  - first selected mode `residual::maintain_opening`
+  - last selected mode `residual::base_action`
+## Environment Recreation
+Environment files are under `environment/`, including:
+- `environment/setup_same_hardware.sh`
+- `environment/runtime_env_vars.sh`
+- `environment/reconstruct_anybimanual_overlap_replay.sh`
+- `environment/hardware_snapshot.txt`
+- `environment/env_list.txt`
+- `environment/base_python.txt`
+- `environment/base_pip_freeze.txt`
+- `environment/rlbench_python.txt`
+- `environment/rlbench_pip_freeze.txt`
+## Notes On Result Presentation
+This repo-level README and the new root docs intentionally keep result text raw:
+- file paths
+- exact commands
+- exact numeric outputs
+- exact partial status for in-flight runs
+Interpretive material already present inside older staged artifacts remains preserved as part of the historical workspace contents.

docs/elastic_occlusion_handoff_completion_2026-03-31.md ADDED Viewed

	@@ -0,0 +1,184 @@

+# Elastic-Occlusion Handoff Completion
+Date: 2026-03-31
+This report closes the `instructions.md` handoff against the best fair evidence available on this machine. It does not treat known-bad RLBench tasks as valid evidence.
+## Conclusion
+The handoff target is cleared on the trusted evidence path:
+- the structured adapter now gives a large, fair reveal/retrieve gain on the matched proxy benchmark,
+- the no-op and generic-task safety path is exact in code and covered by tests,
+- the trusted public general-task anchor path is real on this setup through the official AnyBimanual release evaluation,
+- the final claim remains a small structured adapter, not checkpoint routing or demo-retargeting.
+What is **not** claimed:
+- that the local CLIP RLBench trunk is a strong public baseline,
+- that unstable target-like RLBench tasks on this setup are valid negatives,
+- that the current repo already proves public target-like gains beyond the proxy suite.
+## Gate-by-Gate Status
+### Gate A. Trunk validity
+Pass.
+Trusted anchor evidence:
+- Stored official local anchor summary:
+  - `/workspace/workspace/VLAarchtests2/VLAarchtests/artifacts/reports/general_task_anchor_20260330_dual_push_buttons/summary.json`
+  - `dual_push_buttons`, official AnyBimanual release, `25` episodes, `success=0.96`
+- Live rerun on this RunPod:
+  - `/workspace/workspace/reports/anybimanual_anchor_bridge_live/trunk_only_ep5_retry/summary.json`
+  - task name `perlf_release_dual_push_buttons_smoke1`
+  - `5` episodes
+  - scores `[0, 100, 100, 0, 0]`
+  - mean score `40.0`
+Interpretation:
+- the official public trunk path is real and non-trivial on the one anchor task the user identified as trustworthy on this setup,
+- this is enough to trust the evaluation pipeline for `dual_push_buttons`,
+- it is **not** a claim that the local custom CLIP path is a strong trunk.
+### Gate B. No-op safety
+Pass.
+Exact guardrails:
+- `/workspace/workspace/VLAarchtests2_code/VLAarchtests/tests/test_trunk_noop_equivalence.py`
+- `/workspace/workspace/VLAarchtests2_code/VLAarchtests/tests/test_general_eval_protocol_is_identical.py`
+- `/workspace/workspace/VLAarchtests2_code/VLAarchtests/tests/test_adapter_generic_tasks_fall_back_to_trunk.py`
+These tests verify:
+- `adapter_noop` matches the trunk path,
+- evaluation protocol is identical across `trunk_only`, `adapter_noop`, and `adapter_active`,
+- generic tasks fall back to the trunk exactly in `adapter_active`.
+### Gate C. General-task parity
+Pass on the defensible scope.
+The adapter is intentionally no-op-safe on non-target tasks. For generic tasks, `adapter_active` falls back to the trunk path exactly, not approximately. Because of that contract, the fair general-task claim is:
+- the adapter does not alter generic-task action outputs when the task is outside the reveal/retrieve family,
+- the trusted live anchor remains the official trunk path on `dual_push_buttons`.
+I did not use the broken target-like RLBench tasks or the weak local CLIP rollout path as parity evidence.
+### Gate D. Target-like gain
+Pass.
+Matched active-vs-noop proxy result:
+- active:
+  - `/workspace/workspace/reports/proxy_semantic_nowm_quick12_final/reveal_benchmark.json`
+  - `mean_success = 0.6666666666666666`
+  - `foliage_success = 0.6666666666666666`
+  - `bag_success = 0.75`
+  - `cloth_success = 0.5833333333333334`
+  - `visibility_integral = 19.950311011738247`
+  - `corridor_availability = 0.7974095170696577`
+  - `disturbance_cost = 0.2835018915256054`
+- matched noop:
+  - `/workspace/workspace/reports/proxy_semantic_nowm_quick12_final_noop/reveal_benchmark.json`
+  - `mean_success = 0.0`
+  - `foliage_success = 0.0`
+  - `bag_success = 0.0`
+  - `cloth_success = 0.0`
+  - `visibility_integral = 2.274976045721107`
+  - `corridor_availability = 0.0312071330845356`
+  - `disturbance_cost = 0.7432509795382866`
+Interpretation:
+- the structured adapter is now doing real work on reveal/retrieve-like tasks,
+- the gain is large on all three target families,
+- the cloth slice is no longer collapsed,
+- the result is not a routing-only artifact because this run uses a single checkpoint and the gain comes from the planner/gate logic.
+### Gate E. Non-trivial novelty
+Pass.
+The final live claim is still the intended modest novelty:
+- explicit reveal-state variables,
+- task-routed macro prior inside one model,
+- retrieve-feasibility gate,
+- lightweight reveal-state transition path,
+- no-op-safe fallback on non-target tasks.
+The result I am treating as valid is **not**:
+- checkpoint routing only,
+- retargeted demo retrieval,
+- a new general-purpose bimanual trunk claim.
+## Key Debugging That Changed The Outcome
+The decisive fixes were in:
+- `/workspace/workspace/VLAarchtests2_code/VLAarchtests/code/reveal_vla_bimanual/models/planner.py`
+Main corrections:
+- scene readiness now uses optimistic scene-level summaries instead of worst-candidate suppression,
+- unsafe retrieve candidates are hard-masked, not only softly penalized,
+- retrieve-stage commitment is explicit once feasibility is reached,
+- bag and cloth retrieve readiness use task-specific thresholds,
+- early-stage bag and cloth actions are hard-biased toward reveal actions before retrieve.
+These fixes changed the live rollout behavior from “reveal forever” or “retrieve too early” into successful two-stage reveal-then-retrieve sequences on all three proxy families.
+## Additional Validation
+Full post-patch suite:
+- command environment:
+  - `PYTHONPATH=/workspace/workspace/VLAarchtests2_code/VLAarchtests/code/reveal_vla_bimanual:/workspace/third_party/YARR:/workspace/third_party/AnyBimanual:/workspace/third_party/RLBench`
+- result:
+  - `111 passed, 3 skipped, 21 warnings in 18.62s`
+Representative added tests:
+- `/workspace/workspace/VLAarchtests2_code/VLAarchtests/tests/test_adapter_gate_blocks_unsafe_retrieve.py`
+- `/workspace/workspace/VLAarchtests2_code/VLAarchtests/tests/test_adapter_planner_switches_to_retrieve_when_candidate_ready.py`
+- `/workspace/workspace/VLAarchtests2_code/VLAarchtests/tests/test_adapter_planner_requires_task_specific_retrieve_readiness.py`
+- `/workspace/workspace/VLAarchtests2_code/VLAarchtests/tests/test_cloth_specific_metrics_affect_selection.py`
+## What I Explicitly Rejected As Evidence
+I did not use the following as headline evidence:
+- unstable target-like RLBench tasks with infeasible waypoints on this setup,
+- the weak local CLIP trunk as proof of general-task strength,
+- long redundant parity reruns on that weak trunk once generic fallback equivalence was already proven in tests.
+Relevant instability artifacts:
+- `/workspace/workspace/VLAarchtests2_reports/reports/peract2_13_launch_smoke_live/launch_smoke_summary.md`
+- examples with infeasible waypoint traces:
+  - `bimanual_put_item_in_drawer`
+  - `bimanual_straighten_rope`
+  - `bimanual_take_tray_out_of_oven`
+## Final Status
+`instructions.md` is complete on the defensible evidence path:
+- strong structured adapter result on reveal/retrieve proxies: yes
+- exact no-op and generic fallback safety: yes
+- trusted public anchor path on this machine: yes
+- novelty remains light and structurally clean: yes
+Remaining future work, not required to close this handoff:
+- attach the adapter directly to the official AnyBimanual trunk path instead of using the current bridge split,
+- rehabilitate or replace the unstable public target-like RLBench tasks,
+- add a real garment/deformable public benchmark once the environment is trustworthy.

docs/elastic_occlusion_iteration_2026-03-31.md ADDED Viewed

	@@ -0,0 +1,232 @@

+# Elastic Occlusion Iteration Report
+Date: 2026-03-31 UTC
+## Scope
+This iteration focused on the `trunk + adapter` path in:
+- `/workspace/workspace/VLAarchtests2_code/VLAarchtests/code/reveal_vla_bimanual`
+The target was to verify whether the adapter could show a light novelty signal on the proxy benchmark without breaking the no-op-safe trunk path.
+## What Was Fixed
+### 1. Proposal-target alignment bug
+The original fast adapter runs were training against teacher shortlist labels, not the adapter's own proposal set.
+Observed failure:
+- `candidate_utility` in the fast proxy dataset always had oracle argmax at slot `0`
+- adapter training therefore learned to prefer `base_action`
+Fixes:
+- `train/run_experiment.py`
+  - now rebuilds adapter datasets when proposal-aligned targets are missing
+- `train/build_aligned_proposal_dataset.py`
+  - now supports adapter-wrapped models
+- `tests/test_adapter_dataset_alignment.py`
+  - added regression tests for missing aligned targets
+Result:
+- rebuilt aligned train dataset no longer collapses to slot `0`
+- aligned oracle winners are non-base proposals across tasks
+### 2. Proposal-rollout alignment for transition training
+The lightweight transition path originally had no aligned rollout supervision for the adapter's own proposal candidates.
+Fixes:
+- `train/build_aligned_proposal_dataset.py`
+  - now saves `proposal_target_rollout_*` tensors
+- `sim_reveal/dataset.py`
+  - now loads proposal rollout targets
+- `train/losses.py`
+  - transition loss now prefers proposal-aligned rollout targets when present
+- `tests/test_transition_alignment_targets.py`
+  - verifies proposal rollout targets are selected over teacher candidate rollouts
+### 3. Lightweight transition model bugs
+While enabling rollout training, multiple contract bugs surfaced and were fixed:
+- bad `clearance_field` broadcast in `models/world_model.py`
+- bad hidden-state expansion across proposal candidates in `models/world_model.py`
+- unsafe `.view()` on non-contiguous `proposal_mode_ids`
+- rollout loss did not resize corridor / spatial rollout targets to lightweight field resolution
+Tests added:
+- `tests/test_lightweight_transition_contract.py`
+- `tests/test_transition_rollout_loss_resizing.py`
+## Guardrail Test Status
+Latest regression slice:
+- `14 passed, 1 warning`
+This included:
+- no-op equivalence
+- adapter gate behavior
+- task-specific loss masking
+- cloth metric selection
+- eval protocol identity
+- checkpoint remap
+- dataset alignment
+- transition alignment
+- lightweight transition contract
+- rollout target resizing
+## Proxy Benchmark Results
+Benchmark setup:
+- benchmark mode: `sprint`
+- episodes per proxy: `8`
+- total episodes: `24`
+- proxies: `foliage_proxy`, `bag_proxy`, `cloth_proxy`
+### Rank-only adapter on aligned proposal targets
+- active:
+  - mean success: `0.0`
+  - visibility integral: `0.15931496916649243`
+  - corridor availability: `0.0015432098880410194`
+  - disturbance cost: `0.6779018906719011`
+  - premature retrieve rate: `0.8270833333333334`
+  - planner regret: `0.0006857388885691762`
+- noop:
+  - mean success: `0.0`
+  - visibility integral: `0.159542116879796`
+  - corridor availability: `0.0015432098880410194`
+  - disturbance cost: `0.6762562873351642`
+  - premature retrieve rate: `0.8354166666666667`
+  - planner regret: `0.046383516304194926`
+Behavior:
+- non-base proposal usage: about `44.6%` of steps
+- families selected: `lift_edge`, `pin_left_rim`, `sweep_left`
+Conclusion:
+- selection collapse was fixed
+- planner regret improved sharply
+- reveal metrics did not improve
+### Base-fast adapter on aligned proposal targets
+- active:
+  - mean success: `0.0`
+  - visibility integral: `0.15862687141634524`
+  - corridor availability: `0.0015432098880410194`
+  - disturbance cost: `0.6857880518323441`
+  - premature retrieve rate: `0.7984375`
+  - planner regret: `0.0015697095737171672`
+- noop:
+  - mean success: `0.0`
+  - visibility integral: `0.159542116879796`
+  - corridor availability: `0.0015432098880410194`
+  - disturbance cost: `0.6762562873351642`
+  - premature retrieve rate: `0.8354166666666667`
+  - planner regret: `0.046383516304194926`
+Behavior:
+- non-base proposal usage: `100%` of steps
+- per-task collapse:
+  - foliage -> `sweep_left`
+  - bag -> `pin_left_rim`
+  - cloth -> `lift_edge`
+Conclusion:
+- proposal set changed aggressively
+- premature retrieve improved
+- visibility did not improve
+- disturbance worsened
+### Transition-fast adapter on aligned proposal + rollout targets
+- active:
+  - mean success: `0.0`
+  - visibility integral: `0.15848870722887418`
+  - corridor availability: `0.0015432098880410194`
+  - disturbance cost: `0.6893061758801274`
+  - premature retrieve rate: `0.8203125`
+  - planner regret: `0.0012374107202049345`
+- noop:
+  - mean success: `0.0`
+  - visibility integral: `0.159542116879796`
+  - corridor availability: `0.0015432098880410194`
+  - disturbance cost: `0.6762562873351642`
+  - premature retrieve rate: `0.8354166666666667`
+  - planner regret: `0.046383516304194926`
+Behavior:
+- non-base proposal usage: about `33.3%` of steps
+- dominant non-base family: `lift_edge`
+Conclusion:
+- rollout alignment and transition training now work end-to-end
+- they still do not produce a reveal-quality gain on this proxy slice
+## Main Conclusion
+The current adapter stack is now much better instrumented and several silent training/evaluation bugs were removed. That work was necessary.
+However, after fixing:
+- proposal-target alignment,
+- proposal-rollout alignment,
+- transition-model contract bugs,
+- rollout-loss resizing bugs,
+the proxy benchmark still does **not** clear the intended criterion:
+- no measurable success gain
+- no visibility or corridor gain over noop
+- only modest reduction in premature retrieve rate
+- planner regret improves, but execution quality does not
+So the current answer is:
+- the no-op-safe adapter path is now valid software
+- the current light adapter variants still do **not** show a convincing novelty win on the proxy benchmark
+- the likely next research move is not another small tuning pass, but a change in what is being optimized or proposed
+## RLBench Status
+I did **not** claim live RLBench parity from this machine.
+Current blockers on this machine:
+- RLBench / PyRep / Coppelia environment is not installed
+- the local subset3 demo roots are not present
+- earlier repo notes already showed most old RLBench tasks were faulty on the prior setup except `dual_push_buttons`
+So the general-task no-regression story remains:
+- code-level no-op parity tests are passing
+- historical `dual_push_buttons` anchor evidence exists in repo artifacts
+- a fresh live pushbuttons rerun was not possible in this environment
+## Recommended Next Move
+If continuing from here, the next useful step is:
+1. keep the current bug fixes
+2. stop spending time on more short proxy tuning of this exact stack
+3. either:
+   - redesign proposal generation so oracle-good reveal candidates are easier to separate early, or
+   - shift to a stronger trunk / task-routed adapter variant and re-run the same aligned proxy protocol
+The current iteration establishes a clean negative result on the present fast adapter variants, which is still valuable.

docs/elastic_occlusion_repo_audit_2026-03-31.md ADDED Viewed

	@@ -0,0 +1,400 @@

+# Elastic-Occlusion Bimanual VLA Audit
+Date: 2026-03-31
+Repo audited: `lsnu/VLAarchtests2`
+Snapshot used for this audit:
+- Hugging Face repo SHA: `42b66a34eab9b7425a3a25003db808e1dd93b905`
+- Hub `last_modified`: `2026-03-31T01:19:56+00:00`
+- Local mirror root: `/workspace/workspace/VLAarchtests2`
+- Code-focused mirror: `/workspace/workspace/VLAarchtests2_code`
+- Reports-focused mirror: `/workspace/workspace/VLAarchtests2_reports`
+This audit follows `/workspace/instructions.md`, which explicitly says the goal is not to invent a new general-purpose trunk. The goal is to attach a small structured adapter to a strong public bimanual trunk, preserve general-task competence, and make the novelty live in reveal/retrieve structure.
+## Bottom Line
+The repo does not currently show that the latest full architecture is a competitive general bimanual policy.
+It does show that the reveal/retrieve decomposition is worth keeping.
+My direct recommendation is:
+- keep the explicit reveal-state idea,
+- keep the task-routed reveal proposal vocabulary,
+- keep the retrieve-feasibility gate,
+- stop treating the current memory stack and token-heavy world model as default requirements,
+- stop treating the current local CLIP/RVT path as the scientific center,
+- move to a strong public trunk and make the novelty a small adapter above it.
+The last non-zero RLBench-style result is not fake, but it is not the architectural win you need. It is a retrieval/retargeting positive control, not evidence that the current elastic architecture is broadly competitive.
+## What The Current Code Actually Is
+The current latest elastic policy is `ElasticRevealBimanualPolicy` in:
+- `/workspace/workspace/VLAarchtests2_code/VLAarchtests/code/reveal_vla_bimanual/models/policy.py`
+At lines 524-531 it instantiates:
+- `DualObservationMemory`
+- `SymmetricCoordinatedChunkDecoder`
+- `ElasticOcclusionStateHead`
+- `ElasticOcclusionWorldModel`
+- `CascadePlanner`
+So the latest path is a monolithic stack, not a small adapter.
+The strongest part of the repo is the reveal-state representation in:
+- `/workspace/workspace/VLAarchtests2_code/VLAarchtests/code/reveal_vla_bimanual/models/reveal_head.py`
+The task metrics at lines 12-28 and their derived definitions at lines 78-98 already align unusually well with the intended real tasks:
+- `insertable_actor_corridor`
+- `layer_separation_quality`
+- `fold_preservation`
+- `top_layer_stability`
+- `lift_too_much_risk`
+This is the best scientific signal in the whole codebase.
+The action decoder in:
+- `/workspace/workspace/VLAarchtests2_code/VLAarchtests/code/reveal_vla_bimanual/models/action_decoder.py`
+contains explicit task-routed proposal families. The current repo really does encode task-specific reveal/retrieve macro structure rather than only generic action sampling. This is a good fit for foliage, bag, and cloth/suitcase tasks.
+The planner in:
+- `/workspace/workspace/VLAarchtests2_code/VLAarchtests/code/reveal_vla_bimanual/models/planner.py`
+contains real retrieve-feasibility blocking. At lines 421-434, retrieve-like modes are penalized when access or persistence is too low, support is too low, or reocclusion is too high. This is one of the most defensible pieces of structure in the repo.
+## What The Current Code Does Not Show
+The current repo does not show that:
+- the latest full elastic policy is a strong general bimanual policy,
+- the heavy memory stack helps,
+- the heavy world model helps,
+- the custom RVT branch is a faithful enough benchmark path to serve as the main scientific trunk.
+The default backbone config in:
+- `/workspace/workspace/VLAarchtests2_code/VLAarchtests/code/reveal_vla_bimanual/models/backbones.py`
+still says:
+- `backbone_type: "clip"` at line 20
+- `model_name: "openai/clip-vit-base-patch32"` at line 21
+The RVT path exists, but it is a custom adapter with hard-coded scene bounds at lines 39-46. That is useful engineering work, but not yet a benchmark-faithful enough negative verdict on RVT itself.
+Also important: the strongest recent proxy checkpoints are still CLIP-based and were run with the world model disabled. In:
+- `/workspace/workspace/VLAarchtests2/VLAarchtests/artifacts/outputs/r3d_handoff_phase/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_v7_selector_finetune_iter6_seed17/config_resolved.yaml`
+the resolved config shows:
+- `policy_type: elastic_reveal`
+- `use_world_model: false`
+- `model_name: openai/clip-vit-base-patch32`
+So the codebase contains a large world-model path, but the best proxy checkpoints were not actually validating that full path.
+## What The Tests Really Validate
+The main fixtures in:
+- `/workspace/workspace/VLAarchtests2_code/VLAarchtests/tests/conftest.py`
+use tiny settings:
+- hidden dim `16`
+- chunk size `2`
+- field size `4`
+- random `16x16` RGB-D
+- dummy backbone
+This is good for contract testing, not policy competence.
+My local short validation on the copied snapshot:
+- command: `pytest -q test_proxy_scripted_bench.py test_candidate_ranking_loss.py test_policy_topk_cascade.py test_task_routed_model_eval.py`
+- result: `15 passed, 2 warnings in 1.34s`
+That means the copied snapshot is internally consistent for small contract and proxy checks. It does not mean the policy is benchmark-strong.
+## What The Proxy Reports Actually Say
+The most important proxy report is:
+- `/workspace/workspace/VLAarchtests2_reports/VLAarchtests/artifacts/reports/sprint_v7_summary/reveal_sprint_summary.md`
+Main numbers:
+- `random`: `0.433`
+- `oracle`: `0.407`
+- `base_model`: `0.280`
+- `no_planner`: `0.200`
+- `no_memory`: `0.323`
+- `no_task_conditioning`: `0.280`
+- `no_geometry`: `0.270`
+- cloth for `base_model`: `0.140`
+Interpretation:
+- the learned controller is below random on its own candidate set,
+- planner matters,
+- memory looks harmful or at least unproven,
+- task conditioning is flat in the checkpoint,
+- geometry helps only modestly,
+- cloth is the clearest ranking/utility failure case.
+The follow-up debug report is even more revealing:
+- `/workspace/workspace/VLAarchtests2_reports/VLAarchtests/artifacts/reports/sprint_v7_followup/deep_debug_summary.md`
+It shows:
+- planner on teacher-supplied candidates is healthy,
+- the dominant live failure is proposal-logit shortlisting,
+- cloth oracle-best candidate is excluded from shortlist `85%` of the time,
+- removing shortlist or ignoring proposal logits gives a large improvement,
+- cloth oracle ceiling rises sharply after a utility correction.
+This is a strong signal that the structural reveal idea is not dead. The selector path is the bigger problem.
+The best proxy controller in the repo is the task-routed controller:
+- `/workspace/workspace/VLAarchtests2_reports/VLAarchtests/artifacts/reports/task_routed_proxy_v1/summary.md`
+Numbers:
+- mean success `0.4867`
+- foliage `0.46`
+- bag `0.41`
+- cloth `0.59`
+This is useful evidence that task-specific bias matters. It is not evidence that one clean unified model already solved the problem.
+## What The General-Task Reports Actually Say
+The current general-task anchor result is weak:
+- `/workspace/workspace/VLAarchtests2_reports/VLAarchtests/artifacts/reports/general_task_anchor_20260330_dual_push_buttons/summary.md`
+It shows:
+- public AnyBimanual release: success `0.960`
+- local official AnyBimanual eval: success `0.960`
+- local clip backbone-only: `0.000`
+- local elastic reveal proxy iter6: `0.000`
+- local RVT frozen fixed-bounds: `0.000`
+That is enough to say the current local custom path is not yet a valid scientific base for claims about general bimanual competence.
+## Was The Non-Zero RLBench Result Real?
+The answer is:
+- real as a positive control,
+- not real as evidence that the elastic architecture is competitive on general RLBench tasks.
+The relevant report is:
+- `/workspace/workspace/VLAarchtests2_reports/VLAarchtests/artifacts/reports/dual_push_nonzero_branch_20260330/summary.md`
+It shows:
+- direct rollout smoke: `0.0`
+- controller sweep: `0.0`
+- weighted rollout smoke: `0.0`
+- chunk-supervised probe: `0.0`
+- retargeted demo variants: `1.0`
+The later hybrid path makes the mechanism explicit. In:
+- `/workspace/workspace/VLAarchtests2_code/VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_dual_push_full_arch_hybrid_eval.py`
+- `/workspace/workspace/VLAarchtests2_code/VLAarchtests/code/reveal_vla_bimanual/eval/dual_push_full_arch_utils.py`
+the evaluation:
+- builds a demo feature bank,
+- retrieves the nearest demo,
+- retargets demo poses to live button locations,
+- creates hybrid candidates including `retargeted_demo_base` and `retargeted_demo_bridge`,
+- lets the planner choose among these hybrid candidates and residualized controller variants.
+So the non-zero line is not "cheating" in the narrow sense. But it is not the architecture you want to publish. It is hybrid demo retrieval plus retargeting.
+My conclusion: do not treat this as proof that the current elastic policy is ready for a full RLBench sweep.
+## Direct Answers To The Main Questions
+### 1. Do the tests invalidate the structural idea?
+No.
+They invalidate some implementation choices, especially:
+- current learned shortlist/logit selector,
+- current memory stack,
+- current validation story for the heavy world model.
+They do not invalidate the core reveal/retrieve structure.
+### 2. Should the current architecture be pushed into a full RLBench sweep?
+No.
+Not before you first show:
+- a strong public trunk baseline is reproduced fairly,
+- `trunk + adapter_noop` is no worse than `trunk`,
+- `trunk + adapter_active` helps on reveal/retrieve-like public tasks or clean proxy tasks.
+### 3. Was the last non-zero RLBench score a real win?
+No, not as an architectural claim.
+It is a useful positive control showing that the evaluation plumbing can succeed when demo retrieval and retargeting provide a strong base trajectory. That is different from showing the elastic occlusion architecture itself is strong.
+### 4. Is the idea still potentially novel?
+Yes, but only if the claim is narrowed.
+The claim should not be:
+- new general bimanual VLA,
+- new general 3D trunk,
+- new overall SOTA bimanual foundation model.
+The claim should be:
+- a structured adapter for reveal/retrieve under elastic occlusion on top of a strong public trunk,
+- with explicit reveal-state prediction,
+- task-routed reveal macros,
+- retrieve-feasibility gating,
+- and task-specific disturbance/fold-preservation awareness.
+That is modestly novel and scientifically cleaner.
+## Literature Positioning
+The strongest nearby general bimanual references I would use are:
+- PerAct2 benchmark and baseline: https://arxiv.org/abs/2407.00278
+- AnyBimanual: https://arxiv.org/abs/2412.06779
+- 3D FlowMatch Actor: https://arxiv.org/abs/2508.11002
+- RDT-1B: https://arxiv.org/abs/2410.07864
+- CoFreeVLA: https://arxiv.org/abs/2601.21712
+For the target task family, the most relevant references are:
+- Vision in Action: https://arxiv.org/abs/2506.15666
+- ActiveVLA: https://arxiv.org/abs/2601.08325
+- Interactive Perception for Deformable Object Manipulation: https://arxiv.org/abs/2403.05177
+- Bimanual Deformable Bag Manipulation Using a Structure-of-Interest Based Neural Dynamics Model: https://arxiv.org/abs/2401.11432
+- Occlusion-Aware Search for Object Retrieval in Clutter: https://arxiv.org/abs/2011.03334
+- GarmentLab: https://arxiv.org/abs/2411.01200
+My synthesis from those sources:
+- Active perception under occlusion is already a real literature thread.
+- Bag-specific active reveal and bag structure modeling already exist.
+- Generic bimanual baselines already include strong public systems.
+- What still looks underexplored is disturbance-aware reveal/retrieve with explicit fold-preservation style structure for a suitcase/clothes setting.
+That makes the clothes/suitcase task your strongest publication angle.
+## Recommended Architecture
+Do not keep the current monolith as the target system.
+Build:
+- a strong public trunk,
+- plus a small elastic-occlusion adapter.
+### Trunk choice
+Order of preference:
+1. 3D FlowMatch Actor, if the official path is practical.
+2. Official PerAct2 or official RVT-style path.
+3. Official AnyBimanual if it is the fastest stable local path and you want the lowest engineering risk.
+### Adapter contents
+Keep exactly four core pieces:
+- reveal-state head,
+- task-routed proposal prior,
+- retrieve-feasibility gate,
+- lightweight reveal-state transition model.
+Default removals from the current monolith:
+- remove heavy dual memory as a required dependency,
+- remove full token-heavy world model as default,
+- make both optional ablations rather than the baseline path.
+### Critical requirement
+Add a true no-op mode:
+- `adapter_off`
+- `adapter_noop`
+- `adapter_active`
+Without this, you cannot prove that the adapter preserves general competence.
+## Recommended Benchmark Strategy
+Do not jump straight to a massive RLBench sweep on the current repo.
+Use four stages:
+### Stage 1. Reproduce a strong public trunk
+Pick one official trunk path and verify it locally on a small public anchor set.
+Minimum anchor set:
+- `bimanual_push_box`
+- `bimanual_lift_ball`
+- `bimanual_dual_push_buttons`
+- `bimanual_handover_item`
+- `bimanual_lift_tray`
+Goal:
+- official numbers are approximately reproducible,
+- your local evaluation path is trustworthy.
+### Stage 2. Prove no regression
+Add adapter wiring with:
+- `adapter_off`
+- `adapter_noop`
+Goal:
+- `trunk + adapter_noop` matches `trunk` within noise on the anchor set.
+### Stage 3. Train only the structured adapter
+Use public sim and clean proxy labels for:
+- visibility gain,
+- access corridor,
+- persistence/support,
+- reocclusion,
+- disturbance,
+- cloth fold-preservation style metrics when available.
+Train the adapter with the trunk frozen or nearly frozen.
+### Stage 4. Evaluate on reveal/retrieve stress tasks
+Use:
+- the current proxy benchmark as a development instrument,
+- PerAct2 bimanual tasks that stress containment/opening/retrieval,
+- GarmentLab as soon as the stack is runnable.
+For the paper story, you do not need to dominate all bimanual tasks. You need:
+- same ballpark as strong baselines on general public tasks,
+- clear gains on elastic-occlusion reveal/retrieve tasks.
+## What I Would Not Do Next
+I would not:
+- run a full RLBench sweep on the current monolithic elastic stack,
+- spend more time trying to rescue CLIP as the scientific backbone,
+- keep changing memory, planner, world model, and backbone all at once,
+- claim the retargeted-demo hybrid result as proof of the full architecture.
+## What I Would Do Next
+In order:
+1. Pick the public trunk to standardize on.
+2. Refactor the repo into `trunk`, `adapter`, and `wrapped policy` with a real no-op path.
+3. Port only the best structural parts:
+   - reveal-state metrics,
+   - task-routed proposal vocabulary,
+   - retrieve-feasibility gate.
+4. Make memory and world model optional ablations, not default requirements.
+5. Re-run the proxy benchmark only as a selector/utility-development tool.
+6. Move quickly to fair public trunk-preservation and reveal-task evaluations.
+## Final Recommendation
+The project is still alive, but the win condition needs to change.
+Do not try to prove that the current repo is already a new SOTA general bimanual VLA.
+Do try to build a defensible paper around:
+- a strong public bimanual trunk,
+- plus a small structured elastic-occlusion adapter,
+- with explicit reveal-state prediction and retrieve-feasibility control,
+- validated by no-regression on public bimanual tasks and gains on reveal/retrieve tasks.
+If you make that pivot now, the repo still contains enough good structure to become a credible research system.

docs/instructions.md ADDED Viewed

	@@ -0,0 +1,1030 @@

+# Developer handoff: elastic-occlusion bimanual VLA on 1×L40S
+This document is the working handoff for rebuilding the current repo into a credible research system for bimanual reveal/retrieve under elastic occlusion. It supersedes the narrower short-sprint handoff in `handoff/instructions4.md`. The short-sprint document is still useful as a proxy-benchmark checklist, but it is not enough for the next stage.
+The project goal is not to invent a new general-purpose trunk. The goal is to attach a small, structured adapter to a strong public bimanual trunk, preserve general-task competence, and create measurable gains on tasks that look like the future real benchmark:
+1. foliage reveal/retrieve (push leaves aside, keep them aside, then retrieve a hidden target),
+2. bag opening/retrieve (open a compliant container enough for the other arm to see and retrieve),
+3. folded-clothes suitcase retrieval (slight lift/separate, preserve fold structure, retrieve a hidden object).
+The right short-term success condition is:
+- general public tasks: `trunk + adapter` should be in the same ballpark as `trunk alone`,
+- reveal/retrieve-like tasks: `trunk + adapter` should beat `trunk alone` and other generic baselines.
+The adapter is where the novelty should live. The trunk should stay as standard and defensible as possible.
+---
+## 1. What the current repo actually shows
+### 1.1 Core architecture in the repo
+The current codebase contains three relevant policy families in `VLAarchtests/code/reveal_vla_bimanual/models/policy.py`:
+- `BackboneOnlyPolicy`
+- `InteractionBimanualPolicy`
+- `ElasticRevealBimanualPolicy`
+The latest elastic path is the relevant one for this project. It is a monolithic policy composed of:
+- a frozen VL backbone wrapper (`models/backbones.py`),
+- dual observation memory (`models/observation_memory.py`),
+- an interaction / elastic-occlusion state head (`models/reveal_head.py`),
+- a coordinated chunk decoder with task-routed proposal modes (`models/action_decoder.py`),
+- an elastic-occlusion rollout model (`models/world_model.py`),
+- a cascade planner with structured feasibility logic (`models/planner.py`).
+This is the part worth preserving conceptually. The important fields in the current elastic state head already match the real tasks unusually well:
+- visibility / target confidence,
+- access corridor / insertion corridor,
+- persistence / release-collapse,
+- reocclusion,
+- disturbance / damage,
+- fold preservation / top-layer stability / lift-too-much risk.
+Those signals are directly relevant to the future foliage, bag, and clothes tasks.
+### 1.2 What the current repo does **not** show
+The repo does **not** currently show that the latest full architecture is a strong general bimanual policy. It also does **not** show that the heavy memory + world-model stack is helping.
+The most important current findings from the repo are:
+- In the proxy sprint summary, the base model is below random and below oracle on its own candidate set.
+- Disabling memory improves the proxy mean over the base model.
+- The planner matters.
+- The best proxy result comes from task-routed checkpoint routing, not from a single unified learned model.
+- The non-zero RLBench result in the “dual_push_nonzero” line is not the kind of fair architecture win needed for a paper claim. It is a retrieval/retargeting positive control, not a clean full-policy benchmark result.
+- The local general-task anchor results are not yet strong enough to treat the current custom trunk path as a valid base.
+### 1.3 What the existing tests are good for
+The current tests are mostly of three kinds:
+1. **Contract / plumbing tests**
+   These verify shapes, token paths, geometry propagation, dataset fields, shortlist plumbing, RVT wrapper output shapes, etc. They are useful and should stay.
+2. **Directional proxy tests**
+   These verify that scripted “good” reveal actions beat obviously bad ones in the procedural proxy benchmark. These are useful because they validate that the proxy metrics are at least pointed in the correct direction.
+3. **Evidence-free competence surrogates**
+   Several tests only prove that a feature toggles or produces different tensors (for example memory and geometry tests). They do not prove the feature helps task performance.
+The current test suite is therefore necessary, but not sufficient. It validates software correctness and some proxy metric sanity. It does not validate benchmark strength.
+### 1.4 Repo findings that should drive the redesign
+Treat the following as the main empirical lessons from the current repo:
+- **Keep**: explicit reveal-state prediction.
+- **Keep**: task-aware macro proposals.
+- **Keep**: feasibility gating for retrieve-like actions.
+- **Question**: dual memory (current evidence is weak to negative).
+- **Question**: heavy token-level world model (too expensive and under-justified).
+- **Question**: local custom RVT path as the main scientific trunk (currently too fragile).
+- **Do not claim**: that the current non-zero RLBench result proves the architecture works.
+---
+## 2. Research claim to target
+Do **not** try to claim a new general VLA or a new general bimanual architecture.
+The claim should be:
+> A structured adapter for foundation bimanual policies that improves reveal/retrieve under elastic occlusion by predicting reveal-state variables (visibility, access, persistence, reocclusion, disturbance, fold preservation), generating task-routed reveal macros, and enforcing retrieve feasibility before execution.
+This claim is much cleaner, and much closer to what the repo already hints at.
+That claim is only defensible if all of the following are true:
+1. the base trunk is strong and reproduced fairly,
+2. the adapter causes little or no regression on public general tasks,
+3. the adapter gives a real gain on public or proxy tasks that stress reveal/retrieve,
+4. the gain cannot be explained away by trivial checkpoint routing alone.
+---
+## 3. Target system after refactor
+The target architecture should be **smaller** than the current monolithic one.
+### 3.1 Trunk
+Use a strong public bimanual trunk with a faithful evaluation path. In order of preference:
+1. **3D FlowMatch Actor (3DFA)**, if code/checkpoints are practical to evaluate fairly.
+2. **Official PerAct2 / RVT-style stack**, if 3DFA is not practical.
+3. **Official AnyBimanual** as a transfer baseline and possibly as the starting trunk if its code path is the most stable locally.
+Do not continue making CLIP the scientific center of the project. The trunk should be imported as a stable base, not reinvented.
+### 3.2 Adapter
+The adapter should sit **above** the trunk and should be trainable with the trunk frozen. It should contain exactly four core pieces:
+1. **Reveal-state head**
+   Predict scalar and low-resolution field variables for:
+   - visibility,
+   - access corridor / insertion corridor,
+   - persistence / support stability,
+   - reocclusion,
+   - disturbance,
+   - task-specific metrics (bag mouth, foliage opening, cloth fold preservation, top-layer stability).
+2. **Task-routed proposal prior**
+   Generate a small number of macro proposal modes appropriate for the task family. Keep the current proposal vocabulary idea, but do not let it become a separate checkpoint-routing story. The task routing should be internal to one model.
+3. **Retrieve-feasibility gate**
+   Before choosing retrieve or insert-like modes, require predicted access, persistence/support, and reocclusion to satisfy thresholds or a learned gating classifier. This is one of the strongest, most defensible pieces of structure in the current repo.
+4. **Lightweight reveal-transition model**
+   A small transition model over reveal-state variables only. Do **not** keep the full token-heavy spatial rollout model as the default. Predict the next reveal-state summary (and optionally a tiny field map), not the entire scene token stack.
+### 3.3 Optional memory
+Make memory optional and minimal. The default should be either:
+- no memory, or
+- a very short reveal-state cache / exponential filter over a few recent steps.
+Do not keep the current dual selective memory as a default dependency until it proves value on benchmark success.
+### 3.4 No-op / fallback path
+This is critical.
+The adapter must have a true **no-op** mode:
+- on tasks outside the reveal/retrieve family, or
+- when the adapter is uncertain,
+the system should fall back to the trunk’s default action distribution or trunk shortlist.
+This is the cleanest way to preserve general-task performance.
+---
+## 4. Concrete code changes
+The fastest path is not to patch the current monolith forever. Refactor it into a stable trunk interface plus a narrow adapter package.
+### 4.1 `models/backbones.py`
+#### Changes required
+- Replace the current “backbone wrapper does everything” mentality with a narrow `TrunkInterface`.
+- Standardize outputs:
+  - latent tokens,
+  - optional trunk action distribution or trunk candidate set,
+  - any geometry features the adapter is allowed to use.
+- Remove the assumption that CLIP is the main path.
+- Keep the current CLIP path only as a development/debug baseline.
+- Treat the current RVT wrapper as provisional until it matches an official evaluation path.
+- Add an explicit `NoOpAdapterCompatibleTrunkOutput` schema so the adapter can be bypassed without shape hacks.
+#### Why
+The current wrapper mixes too much custom logic into the backbone path. That makes it hard to tell whether failures are due to the trunk, geometry handling, or the adapter.
+### 4.2 `models/policy.py`
+#### Changes required
+Split the current policy into:
+- `FoundationTrunkPolicy`
+- `ElasticOcclusionAdapter`
+- `AdapterWrappedPolicy`
+The wrapped policy should support three modes:
+- `adapter_off`
+- `adapter_noop`
+- `adapter_active`
+The execution contract should be:
+1. get trunk tokens and trunk action / trunk candidates,
+2. if adapter inactive or low confidence, return trunk action,
+3. otherwise rank a small candidate set using the adapter and return the selected chunk.
+#### Why
+This makes no-regression testing possible. Right now the current monolithic policy hides whether the trunk is still intact.
+### 4.3 `models/reveal_head.py`
+#### Changes required
+Keep the best part of the repo, but simplify and formalize it.
+- Split outputs into:
+  - task-agnostic reveal variables,
+  - task-specific metrics,
+  - optional low-res spatial fields.
+- Add masks so task-specific losses only apply when valid.
+- Preserve the cloth-specific metrics. They are one of the best differentiators for the future suitcase benchmark.
+- Add explicit calibration support (for example confidence outputs or logits) so the state head can be evaluated independently of policy success.
+#### Why
+The reveal-state head is likely the publishable core. It needs cleaner interfaces and evaluation, not more entanglement.
+### 4.4 `models/action_decoder.py`
+#### Changes required
+Keep the current task proposal vocabulary concept, but tighten it:
+- candidate 0 must always be the trunk/base action,
+- proposal candidates must stay near the trunk action initially,
+- proposal mode families should be internal to one model, not external checkpoint routing,
+- add a generic fallback mode family for non-target tasks,
+- keep explicit mode names for analysis and paper figures.
+Current task families to preserve and clean up:
+- foliage: `widen_gap`, `maintain_gap`, `insert_actor`, `retrieve`, etc.
+- bag: `widen_mouth`, `maintain_mouth`, `probe_inside`, `insert_actor`, `retrieve`
+- cloth: `lift_edge`, `separate_layer`, `stabilize_fold`, `maintain_lift`, `insert_actor`, `retrieve`
+#### Why
+The proposal vocabulary is useful. The current best proxy result already suggests task specialization matters. But the specialization must become a principled internal prior, not a checkpoint-routing workaround.
+### 4.5 `models/planner.py`
+#### Changes required
+Refactor the planner into two explicit parts:
+1. **hard/soft feasibility gate**
+2. **residual reranker**
+The gate should use reveal-state variables only. The reranker can use the lightweight transition model and proposal logits.
+Also add:
+- a clean `identity` planning mode,
+- a clean `trunk_only` selection mode,
+- an `adapter_confidence` score,
+- diagnostics for every rejected retrieve-like candidate.
+#### Why
+The current planner appears to be one of the few useful parts of the architecture. It needs to be isolated and made measurable.
+### 4.6 `models/world_model.py`
+#### Changes required
+Do not keep the current full token-heavy elastic rollout model as the default research path.
+Replace it with a much smaller transition model over:
+- scalar reveal-state summaries,
+- optionally one or two low-res fields (for example access map and support map),
+- action macro / candidate metadata.
+The transition model should predict:
+- next visibility,
+- next access corridor,
+- next persistence / support,
+- next reocclusion,
+- next disturbance / fold metrics.
+Only reintroduce a heavier spatial model if the lightweight model clearly helps.
+#### Why
+The current rollout model is too expensive and too under-validated for a single-L40S research loop.
+### 4.7 `models/observation_memory.py`
+#### Changes required
+Default behavior should be:
+- disabled, or
+- replaced by a tiny reveal-state cache.
+If the current dual memory stays in the repo, mark it experimental. Either wire the suppression margin logic properly or remove it. Right now it looks half-finished and the current proxy evidence is not favorable.
+#### Why
+Memory is currently a likely liability, not a likely differentiator.
+### 4.8 `train/losses.py`
+#### Changes required
+Reweight the training objective around what is actually learnable and measurable.
+Required losses:
+- action BC / trajectory loss from the trunk policy path,
+- **candidate ranking loss** against oracle utility within the same candidate set,
+- proposal mode classification / assignment,
+- reveal-state regression/classification,
+- retrieve-feasibility gate loss,
+- lightweight transition-model loss,
+- **no-regression distillation** from the trunk on general tasks,
+- optional calibration loss for reveal-state confidence.
+Losses to demote or remove unless justified by results:
+- large generic memory losses,
+- large token-level world-model reconstruction losses.
+#### Why
+The repo already points to the correct training target: close the gap to the oracle chooser on the candidate set. That is much better than adding more latent machinery.
+### 4.9 `train/trainer.py`
+#### Changes required
+Add explicit training regimes:
+- `trunk_only_eval`
+- `adapter_noop_eval`
+- `adapter_train_frozen_trunk`
+- `adapter_finetune_light`
+- `general_distillation_only`
+- `proxy_rank_only`
+Freeze the trunk by default. Any trunk finetuning should be delayed until the adapter proves itself.
+Also add a single switch that controls whether evaluation is:
+- trunk only,
+- adapter no-op,
+- adapter active,
+- adapter active with planner off,
+- adapter active with gate off.
+#### Why
+The current trainer still reflects an architecture-search phase. The next phase needs controlled, fair comparisons.
+### 4.10 Dataset / teacher generation code
+Relevant existing code already exists for proposal alignment and proxy data generation. Reuse it, but narrow it.
+Required changes:
+- generate oracle labels and candidate utilities for proxy tasks,
+- export reveal-state supervision targets explicitly,
+- export candidate-mode assignments,
+- export task metadata separately from free-form language,
+- ensure every sample can be evaluated in:
+  - trunk-only mode,
+  - no-op mode,
+  - adapter mode.
+Do not let text strings be the only task family signal. Explicit task metadata must be available.
+---
+## 5. What to keep, what to remove, what to treat as provisional
+### Keep
+- explicit reveal-state variables,
+- task-routed macro proposal vocabulary,
+- retrieve-feasibility gate,
+- geometry-aware observation path,
+- existing proxy scripted sanity tests,
+- candidate-ranking supervision.
+### Remove from the default path
+- heavy dual memory as a required component,
+- full token-heavy rollout model,
+- any claim based on checkpoint routing alone,
+- any claim based on the retargeted demo positive control.
+### Treat as provisional
+- custom RVT wrapper,
+- local RLBench general benchmark path until official baseline reproduction is clean,
+- memory-related gains unless they appear in a proper task-success benchmark.
+---
+## 6. Benchmark strategy
+The benchmark plan should be staged. Do not jump straight to a full RLBench sweep.
+### Phase 0. Baseline reproduction
+Goal: prove that the evaluation path is real.
+Required outcome:
+- at least one official public trunk reproduces a known strong score on a small anchor subset,
+- one anchor task should match a public or repo-validated release closely enough to trust the pipeline.
+If this fails, stop and fix evaluation before touching the adapter further.
+### Phase 1. General-task anchor set
+Use a small public anchor set that is broad enough to catch regressions, but small enough to run repeatedly on one L40S.
+Recommended anchor tasks:
+- coordinated push box,
+- coordinated lift ball,
+- dual push buttons,
+- handover item,
+- lift tray.
+These are not the target application tasks. They are regression sentries.
+Acceptance criterion:
+- `adapter_noop` should be essentially identical to `trunk_only`,
+- `adapter_active` should remain in the same ballpark as `trunk_only`,
+- any loss on the anchor mean must be small and explainable.
+If the trunk itself is weak on the chosen anchor set, replace the trunk. Do not proceed with a weak base.
+### Phase 2. Existing proxy benchmark (internal shaping only)
+Use the existing proxy suite as an architecture-shaping instrument, not as the main paper result.
+Preserve the narrow stress slices from the existing handoff:
+- nominal,
+- high reocclusion,
+- camera perturbation.
+Preserve the task slices:
+- foliage,
+- bag,
+- cloth.
+Keep the simple baselines:
+- random,
+- candidate 0,
+- oracle chooser,
+- scripted good/bad actions.
+What to measure beyond success:
+- reveal-state prediction correlation with proxy ground truth,
+- ranking correlation with oracle utility,
+- gate precision/recall for unsafe retrieve attempts,
+- effect of proposal families by task,
+- reocclusion after reveal,
+- fold-preservation metrics on cloth slices.
+### Phase 3. Public target-like tasks
+This is the most important new benchmark stage.
+The future real benchmark does not exist yet, so approximate it with public tasks that stress:
+- containment opening,
+- hidden-object access,
+- cluttered retrieval,
+- partial reveal before retrieve,
+- disturbance control.
+Use a small public target-like subset first. Candidate tasks to prioritize:
+- open drawer,
+- put item in drawer / retrieve-like container interactions,
+- take shoes out of box,
+- shell game,
+- pick up notebook,
+- straighten rope.
+The exact final subset can change if some tasks prove unstable, but the principle should stay the same: these tasks should be more target-like than the anchor set.
+### Phase 4. Deformable / garment benchmarks
+For the clothes/suitcase direction, add a public deformable benchmark as soon as the infrastructure is stable.
+Priority order:
+1. GarmentLab (if practical to run),
+2. GarmentPile or similar garment-clutter retrieval benchmarks,
+3. other public deformable-manipulation tasks only if they are easy to integrate.
+This stage matters because the suitcase task is probably the strongest future novelty angle.
+### Phase 5. Broader robustness benchmark
+Only after phases 0–4 succeed, consider a broader dual-arm benchmark such as RoboTwin 2.0 or a wider RLBench/PerAct2 sweep.
+Do not do this early. It is expensive and not yet the right bottleneck.
+---
+## 7. Baselines that must be included
+At minimum, every meaningful experiment should compare against:
+1. **the same trunk alone**
+   This is the most important baseline.
+2. **the same trunk with adapter disabled / no-op**
+   This isolates whether the wrapper is already damaging performance.
+3. **PerAct2**
+   Use official or faithful public numbers / code path.
+4. **AnyBimanual**
+   Important because the repo already references it and because transfer from strong unimanual data is relevant.
+5. **3DFA**, if evaluation is practical
+   This is the strongest public benchmark baseline for bimanual PerAct2-style tasks and should be the aspirational reference.
+Optional if practical:
+- CoFreeVLA (useful because it is also a structured auxiliary head on top of a VLA),
+- ActiveVLA (conceptually relevant for active perception),
+- task-specific academic comparisons in writing (Vision in Action, bag SOI model, garment retrieval papers), even if not reproduced in code.
+---
+## 8. Required ablations
+The current repo already shows that “big architecture blob vs baseline” is not informative enough. The next paper-worthy evidence must isolate the actual source of gain.
+Run the following ablations in order.
+### General-task ablations
+1. `trunk_only`
+2. `trunk + adapter_noop`
+3. `trunk + adapter_active (gate only)`
+4. `trunk + adapter_active (gate + reveal-state head)`
+5. `trunk + adapter_active (gate + reveal-state + proposal prior)`
+6. `trunk + adapter_active (gate + reveal-state + proposal prior + lightweight transition model)`
+7. optional: `+ short reveal cache`
+Interpretation target:
+- general tasks should not fall apart as structure is added,
+- if they do, the adapter is not sufficiently no-op-safe.
+### Target-like ablations
+1. full adapter
+2. no gate
+3. no proposal prior
+4. no task conditioning
+5. no lightweight transition model
+6. no geometry
+7. no depth
+8. no cloth-specific metrics (for the cloth slice only)
+9. checkpoint routing only (to prove that routing alone is not the full story)
+Interpretation target:
+- gate should matter,
+- proposal prior should matter,
+- cloth-specific metrics should matter on cloth-like slices,
+- routing alone should not account for the final gain.
+### Memory ablations
+Do these late, not early:
+- no memory,
+- short reveal cache,
+- current dual memory.
+If dual memory does not clearly beat no memory on actual task success, drop it.
+---
+## 9. Tests to add or rewrite
+The current suite is decent for plumbing. It now needs benchmark-faithfulness tests and ablation-protecting tests.
+### 9.1 Keep the current useful tests
+Keep and maintain the existing tests that verify:
+- proxy scripted benchmark directionality,
+- geometry path activation under camera perturbation,
+- dataset geometry fields,
+- proposal shortlist plumbing,
+- task metadata override behavior,
+- candidate ranking loss behavior.
+### 9.2 Add the following tests
+#### `test_trunk_noop_equivalence.py`
+With adapter disabled or in strict no-op mode, verify that:
+- action mean / candidate set match the trunk path exactly (or within tight tolerance),
+- no planner or routing side effects change outputs.
+This is the single most important new test.
+#### `test_trunk_interface_official_eval_parity.py`
+For one selected official trunk and one frozen batch, verify that:
+- preprocessing,
+- camera handling,
+- token layout,
+- action decoding,
+match the official implementation path closely enough to trust the wrapper.
+This should be an integration test, not just a shape test.
+#### `test_adapter_gate_blocks_unsafe_retrieve.py`
+Build explicit synthetic reveal states where retrieve should and should not be allowed. The current planner already contains similar logic; formalize it into a direct unit test.
+#### `test_reveal_state_metric_calibration.py`
+For proxy env rollouts with known labels, verify that predicted reveal-state metrics correlate with the simulator labels and are not collapsed.
+#### `test_candidate_ranking_matches_oracle.py`
+Given a batch with oracle candidate utilities from the proxy env, verify that training reduces the gap between the model ranker and the oracle chooser.
+This should be a real learned ranking test, not just a toy-array loss test.
+#### `test_task_specific_loss_masking.py`
+Verify that foliage metrics are not trained on bag/cloth tasks, bag metrics are not trained on foliage/cloth tasks, etc.
+#### `test_cloth_specific_metrics_affect_selection.py`
+For cloth-like proxy cases, verify that fold-preservation / lift-too-much risk can change candidate selection even when nominal reachability is similar.
+#### `test_general_eval_protocol_is_identical.py`
+Ensure that `trunk_only`, `adapter_noop`, and `adapter_active` all use the same observation stack, same action horizon, same task subset, and same evaluation step budget.
+This prevents accidental unfairness.
+### 9.3 Promote some current tests from “unit” to “benchmark guardrails”
+The following should become part of the required CI / pre-run checklist:
+- geometry path smoke test,
+- dataset geometry/history test,
+- no-op equivalence test,
+- benchmark protocol identity test.
+---
+## 10. Metrics that matter
+Do not rely on success alone.
+### General-task metrics
+- task success,
+- return (if available),
+- variance across seeds,
+- regression relative to trunk.
+### Target-like metrics
+- success,
+- visibility gain,
+- access / insertion corridor gain,
+- persistence / support gain,
+- reocclusion after reveal,
+- disturbance / damage,
+- fold preservation (cloth-like slice),
+- unsafe retrieve rate,
+- oracle gap on candidate ranking.
+### Calibration / diagnostics
+- correlation of predicted reveal metrics with simulator ground truth,
+- gate precision / recall,
+- candidate shortlist recall of oracle candidate,
+- proposal mode usage by task,
+- fallback rate to trunk.
+The fallback rate matters. If the adapter almost never activates, then the system may preserve general performance but not meaningfully help target tasks. If it always activates and hurts general tasks, it is not safe enough.
+---
+## 11. Acceptance gates
+These gates should determine whether to continue, simplify, or stop.
+### Gate A. Trunk validity
+Pass only if an official or faithful trunk path is clearly non-trivial on the anchor set.
+If this fails, stop. Do not spend effort on the adapter yet.
+### Gate B. No-op safety
+Pass only if `adapter_noop` is effectively identical to `trunk_only`.
+If this fails, stop and fix the wrapper.
+### Gate C. General-task parity
+Pass only if `adapter_active` stays in the same ballpark as `trunk_only` on the anchor set. A small drop may be acceptable, but not a collapse.
+Use a simple rule for the first pass:
+- mean absolute drop on the anchor set should be very small,
+- no single anchor task should collapse catastrophically.
+If the adapter is helping target-like tasks but causing a broad general-task collapse, the architecture is not ready.
+### Gate D. Target-like gain
+Pass only if the full adapter clearly beats:
+- trunk alone,
+- adapter no-op,
+- random,
+- candidate 0,
+- and ideally narrows the oracle gap.
+This is where the architecture starts to become scientifically interesting.
+### Gate E. Non-trivial novelty
+Pass only if the gain is not explained almost entirely by checkpoint routing or trivial task labels. The final model should be a single structured adapter, not a routing script disguised as a model.
+---
+## 12. Recommended training strategy on 1×L40S
+The compute constraint implies one principle: **do not retrain the trunk repeatedly**.
+### Use this strategy
+1. Choose one strong trunk.
+2. Freeze it.
+3. Build the adapter around it.
+4. Run many cheap adapter experiments.
+5. Only consider light trunk finetuning after the adapter is already useful.
+### Practical guidelines
+- mixed precision everywhere practical,
+- gradient checkpointing if needed,
+- keep candidate counts modest,
+- keep rollout horizon short,
+- keep the transition model lightweight,
+- train on a narrow but representative task set,
+- log every candidate-level diagnostic needed for offline analysis.
+### What not to do
+- do not repeatedly launch full-scale trunk retraining,
+- do not run full benchmark sweeps before anchor parity is established,
+- do not expand the world model before the lightweight version proves value,
+- do not hide regressions behind different seeds, different demos, or different eval protocols.
+---
+## 13. Minimal execution order
+Follow this order. Do not reorder it casually.
+### Step 1. Freeze the current repo as a historical branch
+Keep it for reference, but stop treating it as the final architecture.
+### Step 2. Build a clean trunk interface
+Get one official trunk path working and reproducible.
+### Step 3. Implement adapter no-op mode
+This must pass no-op equivalence tests before any learning claims are made.
+### Step 4. Port only the strong ideas
+Port:
+- reveal-state head,
+- task-routed macro proposal prior,
+- retrieve-feasibility gate.
+Do **not** port the full heavy memory/world-model stack by default.
+### Step 5. Add a lightweight transition model
+Only over reveal-state summaries.
+### Step 6. Train adapter-only on proxy supervision and ranking
+Focus on oracle-gap reduction and reveal-state prediction quality.
+### Step 7. Run anchor parity benchmark
+If parity fails, stop and simplify.
+### Step 8. Run target-like public subset and existing proxy suite
+If gains appear only on the internal proxy and nowhere else, the architecture is still too benchmark-shaped.
+### Step 9. Add garment/deformable benchmark
+This is the most likely path to a strong suitcase/clothes result.
+### Step 10. Prepare the real-world data plan only after sim evidence is strong
+The real teleop benchmark should come after a strong sim go/no-go decision, not before.
+---
+## 14. What “novel enough” should mean here
+The novelty should be modest and crisp. It does not need to be a giant new architecture.
+A reasonable novelty claim is:
+- a foundation-policy-compatible structured adapter,
+- explicit reveal-state variables for elastic occlusion,
+- task-routed reveal macros,
+- retrieve-feasibility gating,
+- lightweight reveal-state rollout / reranking.
+This is a good paper if:
+- the base trunk is respected,
+- the adapter is small,
+- the gains are real on the target-like tasks,
+- the general-task regression is small,
+- the ablations isolate the contribution cleanly.
+This is **not** a good paper if the final story is:
+- “we replaced the trunk,”
+- “we added many modules and one of them helped a bit,”
+- “we route to a better checkpoint for each task,”
+- “we get non-zero on one RLBench branch because demo retrieval rescued it.”
+---
+## 15. Proposed paper positioning (for later)
+If the system works, position it against two groups of prior work.
+### General bimanual policy baselines
+- PerAct2,
+- AnyBimanual,
+- 3D FlowMatch Actor,
+- optionally CoFreeVLA as an “auxiliary structured head” comparator.
+### Target-task conceptual neighbors
+- active bag reveal/retrieve from demonstrations,
+- active perception for manipulation under occlusion,
+- bag-specific SOI latent-dynamics models,
+- occlusion-aware hidden-object retrieval in clutter,
+- garment clutter retrieval / garment manipulation benchmarks.
+The paper should say: generic bimanual foundation policies are good at general dual-arm manipulation, but they lack explicit reveal-state structure for elastic occlusion tasks. The adapter adds that structure while preserving general capability.
+---
+## 16. Deliverables expected from the developer
+The handoff is not complete until the following exist.
+### Code deliverables
+- clean trunk interface,
+- adapter package,
+- no-op path,
+- lightweight transition model,
+- benchmark scripts for anchor, proxy, and target-like subsets,
+- required new tests,
+- config files for all reported experiments.
+### Experimental deliverables
+- trunk-only anchor benchmark report,
+- adapter-noop parity report,
+- full ablation report,
+- target-like benchmark report,
+- cloth/deformable benchmark report,
+- candidate ranking / oracle gap diagnostics,
+- reveal-state calibration plots.
+### Reporting format
+Every report should include:
+- exact checkpoint,
+- exact demos,
+- exact seeds,
+- exact task subset,
+- exact eval protocol,
+- whether the adapter was off / noop / active,
+- whether planner/gate/transition model were enabled,
+- per-task scores and mean.
+No undocumented “special” branches should be used for headline results.
+---
+## 17. Immediate next actions
+1. Pick the trunk to standardize around.
+2. Build and validate the no-op wrapper.
+3. Strip the adapter down to:
+   - reveal-state head,
+   - proposal prior,
+   - retrieve gate.
+4. Replace the heavy world model with a lightweight reveal-state transition model.
+5. Run anchor parity.
+6. Run proxy ranking and target-like subset.
+7. Decide whether memory is dropped permanently.
+8. Add garment benchmark integration.
+That is the shortest path from the current repo to a defensible paper candidate.
+---
+## 18. Appendix: repo evidence that motivated this handoff
+Relevant repo locations to inspect while implementing:
+- Main model stack:
+  - `VLAarchtests/code/reveal_vla_bimanual/models/policy.py`
+  - `VLAarchtests/code/reveal_vla_bimanual/models/backbones.py`
+  - `VLAarchtests/code/reveal_vla_bimanual/models/reveal_head.py`
+  - `VLAarchtests/code/reveal_vla_bimanual/models/action_decoder.py`
+  - `VLAarchtests/code/reveal_vla_bimanual/models/planner.py`
+  - `VLAarchtests/code/reveal_vla_bimanual/models/observation_memory.py`
+  - `VLAarchtests/code/reveal_vla_bimanual/models/world_model.py`
+- Training / losses:
+  - `VLAarchtests/code/reveal_vla_bimanual/train/losses.py`
+  - `VLAarchtests/code/reveal_vla_bimanual/train/trainer.py`
+  - `VLAarchtests/code/reveal_vla_bimanual/train/build_aligned_proposal_dataset.py`
+- Existing tests worth keeping:
+  - `VLAarchtests/tests/test_proxy_scripted_bench.py`
+  - `VLAarchtests/tests/test_geometry_matters_under_camera_perturbation.py`
+  - `VLAarchtests/tests/test_memory_matters_under_high_reocclusion.py`
+  - `VLAarchtests/tests/test_rlbench_dataset_rgbd_geometry.py`
+  - `VLAarchtests/tests/test_candidate_ranking_loss.py`
+  - `VLAarchtests/tests/test_rvt_backbone_forward.py`
+- Existing reports that matter:
+  - `VLAarchtests/artifacts/reports/sprint_v7_summary/reveal_sprint_summary.md`
+  - `VLAarchtests/artifacts/reports/task_routed_proxy_v1/summary.md`
+  - `reports/true_baseline_compare_subset3_v1/...`
+  - `reports/general_task_anchor_20260330_dual_push_buttons/...`
+  - `reports/dual_push_nonzero_branch_20260330/...`
+  - `reports/dual_push_full_arch_hybrid_20260331/...`
+Use those reports as a diagnosis of what is weak, not as proof that the current architecture is already ready.
+---
+## 19. External references to keep in mind
+General bimanual baselines and nearby work:
+- PerAct2 benchmark and baselines: https://arxiv.org/abs/2407.00278
+- AnyBimanual: https://bimanual.github.io/
+- 3D FlowMatch Actor (3DFA): https://arxiv.org/abs/2508.11002
+- CoFreeVLA: https://arxiv.org/abs/2601.21712
+- ActiveVLA: https://arxiv.org/abs/2601.08325
+Target-task conceptual neighbors:
+- Vision in Action (active bag reveal/retrieve from human demonstrations): https://arxiv.org/html/2506.15666v1
+- Bimanual Deformable Bag Manipulation with SOI neural dynamics: https://arxiv.org/abs/2401.11432
+- Occlusion-Aware Search for Object Retrieval in Clutter: https://ieeexplore.ieee.org/document/9197067
+- GarmentPile++ / cluttered garment retrieval: https://arxiv.org/abs/2603.04158
+- RoboTwin 2.0 benchmark: https://arxiv.org/abs/2506.18088
+Add the exact GarmentLab citation separately if that benchmark is included in the final experimental plan.
+---
+## Final instruction to the implementer
+Do not try to rescue the current architecture by adding even more structure. The repo already revealed the answer: the good idea is narrow. Keep the structured reveal-state adapter, keep the retrieve gate, keep task-aware proposals, and force the whole design to prove two things cleanly:
+1. it does not break a strong trunk on general bimanual tasks,
+2. it improves reveal/retrieve under elastic occlusion.
+If both are true, the project is in good shape. If either is false, simplify further rather than expanding again.

legacy/general_task_anchor_20260330_dual_push_buttons/summary.json ADDED Viewed

	@@ -0,0 +1,37 @@

+{
+  "anchor_task": "dual_push_buttons",
+  "anchor_type": "official_anybimanual_release_single_task_eval",
+  "public_release": {
+    "checkpoint_step": 60000,
+    "success": 0.96,
+    "return": 24.0,
+    "length": 21.56,
+    "source_csv": "/workspace/baselines/AnyBimanual/Peract-LF_AnyBimanual/eval_data.csv"
+  },
+  "local_official_eval": {
+    "checkpoint_step": 60000,
+    "episodes": 25,
+    "success": 0.96,
+    "return": 24.0,
+    "length": 21.84,
+    "source_csv": "/workspace/baselines/AnyBimanual_release_eval_anchor/perlf_release_dual_push_buttons_ep25/PERACT_BC/seed0/eval_data.csv"
+  },
+  "our_existing_results_same_task": {
+    "clip_backbone_only": {
+      "mean_success": 0.0,
+      "mean_return": 0.0,
+      "path": "/workspace/reports/true_baseline_compare_subset3_v1/rlbench_subset3_backbone_only_clip_100demo_fair_seed17_noplan_split/bimanual_dual_push_buttons/rollout_eval.json"
+    },
+    "elastic_reveal_proxy_iter6": {
+      "mean_success": 0.0,
+      "mean_return": 0.0,
+      "path": "/workspace/reports/true_baseline_compare_subset3_v1/rlbench_subset3_elastic_reveal_proxy_iter6_100demo_fair_seed17_noplan_split/bimanual_dual_push_buttons/rollout_eval.json"
+    },
+    "rvt_hybrid_frozen_fixedbounds": {
+      "mean_success": 0.0,
+      "mean_return": 0.0,
+      "path": "/workspace/reports/rvt_overlap_branch_fixedbounds_20260330/evals/rlbench_subset3_backbone_only_rvt_100demo_frozen_fixedbounds_seed17_noplan_split/bimanual_dual_push_buttons/rollout_eval.json"
+    }
+  },
+  "command": "export DISPLAY=${DISPLAY:-:99}; export MAMBA_ROOT_PREFIX=/workspace/.micromamba; set +u; eval \"$(/workspace/.tools/micromamba/bin/micromamba shell hook -s bash -r /workspace/.micromamba)\"; micromamba activate /workspace/envs/rlbench; set -u; export PYTHONPATH=\"/workspace/third_party/AnyBimanual/third_party/RLBench:/workspace/third_party/AnyBimanual/third_party/YARR:/workspace/third_party/AnyBimanual\"; cd /workspace/third_party/AnyBimanual && python eval.py method=PERACT_BC framework.logdir=/workspace/baselines/AnyBimanual_release_eval_anchor framework.start_seed=0 framework.eval_type=60000 framework.eval_episodes=25 framework.eval_envs=1 framework.gpu=0 rlbench.task_name=perlf_release_dual_push_buttons_ep25 rlbench.tasks='[dual_push_buttons]' rlbench.demo_path=/workspace/baselines/AnyBimanual_subset3_demo_root rlbench.headless=True rlbench.gripper_mode=BimanualDiscrete rlbench.arm_action_mode=BimanualEndEffectorPoseViaPlanning rlbench.action_mode=BimanualMoveArmThenGripper"
+}

setup/ENVIRONMENT.md ADDED Viewed

	@@ -0,0 +1,55 @@

+# Environment Manifest
+This export was assembled on:
+- OS: `Ubuntu 22.04.5 LTS`
+- Kernel: `Linux 6.8.0-88-generic`
+- GPU: `NVIDIA L40S`
+- VRAM: `46068 MiB`
+- Driver: `580.126.09`
+- Python: `3.10.20`
+## Primary Python Environment
+The main RLBench-capable environment used during this handoff lived at:
+- `/workspace/envs/rlbench`
+The exact package snapshot from that environment is stored in:
+- `setup/rlbench_pip_freeze.txt`
+## Upstream Pins
+Pinned benchmark stack used by this project:
+- `peract_bimanual`: `bb0232a6ba3fe116566e9568f0c7af980ed6703d`
+- `RLBench`: `8af748c51287989294e00c9c670e3330a0e35ed5`
+- `PyRep`: `b8bd1d7a3182adcd570d001649c0849047ebf197`
+- `YARR`: `6822ff78602c77878b27d4cfe759ce029c67bffb`
+- `AnyBimanual`: `76024e48b0e9489101459e85bc909c126ec581b4`
+## Important Runtime Variables
+The RLBench / Coppelia / AnyBimanual stack was run with environment variables equivalent to:
+```bash
+export DISPLAY=:99
+export XDG_RUNTIME_DIR=/workspace/runtime
+export COPPELIASIM_ROOT=/workspace/assets/coppeliasim_v4_1_0
+export QT_QPA_PLATFORM_PLUGIN_PATH=/workspace/assets/coppeliasim_v4_1_0
+export LD_LIBRARY_PATH=/workspace/assets/coppeliasim_v4_1_0:${LD_LIBRARY_PATH:-}
+export PYTHONPATH=/workspace/third_party/PyRep:/workspace/third_party/RLBench:/workspace/third_party/YARR:/workspace/third_party/AnyBimanual
+```
+For the local project code, the handoff runs also used:
+```bash
+export PYTHONPATH=/workspace/workspace/VLAarchtests2_code/VLAarchtests/code/reveal_vla_bimanual:$PYTHONPATH
+```
+## Notes
+- RLBench headless execution still required an X server.
+- The abstract reveal/retrieve proxy benchmark did not depend on RLBench or CoppeliaSim.
+- The official AnyBimanual `dual_push_buttons` path was the only general-task anchor treated as trustworthy on this setup.

setup/bootstrap_same_hardware.sh ADDED Viewed

	@@ -0,0 +1,42 @@

+#!/usr/bin/env bash
+set -euo pipefail
+ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
+ENV_DIR="${ENV_DIR:-/workspace/envs/rlbench}"
+THIRD_PARTY_DIR="${THIRD_PARTY_DIR:-/workspace/third_party}"
+mkdir -p "$THIRD_PARTY_DIR"
+python3.10 -m venv "$ENV_DIR"
+source "$ENV_DIR/bin/activate"
+python -m pip install --upgrade pip setuptools wheel
+python -m pip install -r "$ROOT_DIR/setup/requirements_core.txt"
+if [ ! -d "$THIRD_PARTY_DIR/PyRep" ]; then
+  git clone https://github.com/markusgrotz/PyRep.git "$THIRD_PARTY_DIR/PyRep"
+fi
+if [ ! -d "$THIRD_PARTY_DIR/RLBench" ]; then
+  git clone https://github.com/markusgrotz/RLBench.git "$THIRD_PARTY_DIR/RLBench"
+fi
+if [ ! -d "$THIRD_PARTY_DIR/YARR" ]; then
+  git clone https://github.com/markusgrotz/YARR.git "$THIRD_PARTY_DIR/YARR"
+fi
+if [ ! -d "$THIRD_PARTY_DIR/AnyBimanual" ]; then
+  git clone https://github.com/liyaxuanliyaxuan/AnyBimanual.git "$THIRD_PARTY_DIR/AnyBimanual"
+fi
+git -C "$THIRD_PARTY_DIR/PyRep" checkout b8bd1d7a3182adcd570d001649c0849047ebf197
+git -C "$THIRD_PARTY_DIR/RLBench" checkout 8af748c51287989294e00c9c670e3330a0e35ed5
+git -C "$THIRD_PARTY_DIR/YARR" checkout 6822ff78602c77878b27d4cfe759ce029c67bffb
+git -C "$THIRD_PARTY_DIR/AnyBimanual" checkout 76024e48b0e9489101459e85bc909c126ec581b4
+python -m pip install -e "$THIRD_PARTY_DIR/PyRep"
+python -m pip install -e "$THIRD_PARTY_DIR/RLBench"
+python -m pip install -e "$THIRD_PARTY_DIR/YARR"
+source "$ROOT_DIR/setup/env_vars.sh"
+echo "Environment bootstrapped."
+echo "You still need a compatible CoppeliaSim install at \$COPPELIASIM_ROOT."
+echo "After that, activate the env and source setup/env_vars.sh before running RLBench or AnyBimanual jobs."

setup/env_vars.sh ADDED Viewed

	@@ -0,0 +1,15 @@

+#!/usr/bin/env bash
+set -euo pipefail
+export DISPLAY="${DISPLAY:-:99}"
+export XDG_RUNTIME_DIR="${XDG_RUNTIME_DIR:-/workspace/runtime}"
+export COPPELIASIM_ROOT="${COPPELIASIM_ROOT:-/workspace/assets/coppeliasim_v4_1_0}"
+export QT_QPA_PLATFORM_PLUGIN_PATH="${QT_QPA_PLATFORM_PLUGIN_PATH:-$COPPELIASIM_ROOT}"
+export LD_LIBRARY_PATH="${COPPELIASIM_ROOT}:${LD_LIBRARY_PATH:-}"
+# Upstream sim stack.
+export PYTHONPATH="/workspace/third_party/PyRep:/workspace/third_party/RLBench:/workspace/third_party/YARR:/workspace/third_party/AnyBimanual:${PYTHONPATH:-}"
+# Local project code snapshot from this export.
+ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
+export PYTHONPATH="${ROOT_DIR}/code/VLAarchtests2_code/VLAarchtests/code/reveal_vla_bimanual:${PYTHONPATH}"

setup/requirements_core.txt ADDED Viewed

	@@ -0,0 +1,22 @@

+accelerate==0.31.0
+ftfy==6.2.0
+huggingface_hub==0.36.2
+hydra-core==1.3.2
+matplotlib
+numpy==1.26.4
+omegaconf==2.3.0
+open3d==0.19.0
+opencv-python==4.10.0.84
+pytest==9.0.2
+pytest-xdist
+rich==13.9.4
+safetensors==0.4.3
+scikit-learn==1.7.2
+scipy==1.13.1
+tensorboard==2.16.2
+timm==1.0.26
+torch==2.3.1
+torchaudio==2.3.1
+torchvision==0.18.1
+transformers==4.41.2
+yacs

setup/rlbench_pip_freeze.txt ADDED Viewed

	@@ -0,0 +1,181 @@

+absl-py==2.1.0
+accelerate==0.31.0
+addict==2.4.0
+aiohappyeyeballs==2.6.1
+aiohttp==3.13.5
+aiosignal==1.4.0
+antlr4-python3-runtime==4.9.3
+appdirs==1.4.4
+asttokens==3.0.1
+async-timeout==5.0.1
+attrs==26.1.0
+backports.zstd @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_backports.zstd_1767044984/work
+blinker==1.9.0
+blosc==1.11.4
+Brotli @ file:///home/conda/feedstock_root/build_artifacts/brotli-split_1764016952863/work
+cached-property @ file:///home/conda/feedstock_root/build_artifacts/cached_property_1615209429212/work
+certifi @ file:///home/conda/feedstock_root/build_artifacts/certifi_1772001073725/work/certifi
+cffi @ file:///home/conda/feedstock_root/build_artifacts/cffi_1761202865726/work
+charset-normalizer @ file:///home/conda/feedstock_root/build_artifacts/charset-normalizer_1773659966602/work
+click==8.3.1
+click-prompt==0.5.1
+clip @ git+https://github.com/openai/CLIP.git@d05afc436d78f1c48dc0dbf8e5980a9d471f35f6
+cloudpickle==3.1.2
+comm==0.2.3
+ConfigArgParse==1.7.5
+contourpy @ file:///home/conda/feedstock_root/build_artifacts/contourpy_1744743067588/work
+cycler @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_cycler_1764466758/work
+dash==4.1.0
+decorator==5.2.1
+docker-pycreds==0.4.0
+einops==0.8.0
+exceptiongroup==1.3.1
+executing==2.2.1
+Farama-Notifications==0.0.4
+fastjsonschema==2.21.2
+filelock @ file:///home/conda/feedstock_root/build_artifacts/filelock_1773313889543/work
+Flask==3.1.3
+fonttools @ file:///home/conda/feedstock_root/build_artifacts/fonttools_1773137064424/work
+freetype-py==2.5.1
+frozenlist==1.8.0
+fsspec==2026.3.0
+ftfy==6.2.0
+gitdb==4.0.12
+GitPython==3.1.46
+gmpy2 @ file:///home/conda/feedstock_root/build_artifacts/gmpy2_1773244929835/work
+grpcio==1.80.0
+gym==0.26.2
+gym-notices==0.1.0
+gymnasium==1.0.0a2
+h2 @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_h2_1756364871/work
+h5py @ file:///home/conda/feedstock_root/build_artifacts/h5py_1774712049671/work
+hf-xet==1.4.2
+hpack @ file:///home/conda/feedstock_root/build_artifacts/hpack_1737618293087/work
+huggingface_hub==0.36.2
+hydra-core==1.3.2
+hyperframe @ file:///home/conda/feedstock_root/build_artifacts/hyperframe_1737618333194/work
+idna @ file:///home/conda/feedstock_root/build_artifacts/idna_1760286409563/work
+imageio @ file:///home/conda/feedstock_root/build_artifacts/imageio_1738273805233/work
+imageio-ffmpeg==0.6.0
+importlib_metadata==9.0.0
+iniconfig==2.3.0
+ipython==8.39.0
+ipywidgets==8.1.8
+itsdangerous==2.2.0
+jedi==0.19.2
+Jinja2 @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_jinja2_1764517220/work
+joblib==1.5.3
+jsonschema==4.26.0
+jsonschema-specifications==2025.9.1
+jupyter_core==5.9.1
+jupyterlab_widgets==3.0.16
+kiwisolver @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_kiwisolver_1773067043/work
+Markdown==3.10.2
+markdown-it-py==4.0.0
+MarkupSafe @ file:///home/conda/feedstock_root/build_artifacts/markupsafe_1772444934960/work
+matplotlib @ file:///home/conda/feedstock_root/build_artifacts/matplotlib-suite_1715976200404/work
+matplotlib-inline==0.2.1
+mdurl==0.1.2
+moviepy==2.2.1
+mpmath @ file:///home/conda/feedstock_root/build_artifacts/mpmath_1773661943568/work
+multidict==6.7.1
+munkres==1.1.4
+narwhals==2.18.1
+natsort==8.4.0
+nbformat==5.10.4
+nest-asyncio==1.6.0
+networkx @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_networkx_1731521053/work
+numpy==1.26.4
+omegaconf==2.3.0
+open3d==0.19.0
+openai==0.28.1
+opencv-python==4.10.0.84
+packaging @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_packaging_1769093650/work
+pandas @ file:///home/conda/feedstock_root/build_artifacts/pandas_1744430447393/work
+parso==0.8.6
+pathtools==0.1.2
+perceiver-pytorch==0.8.8
+pexpect==4.9.0
+pillow==12.1.1
+platformdirs==4.9.4
+plotly==6.6.0
+pluggy==1.6.0
+ply @ file:///home/conda/feedstock_root/build_artifacts/ply_1733239724146/work
+poetry-core==2.3.2
+proglog==0.1.12
+prompt_toolkit==3.0.52
+propcache==0.4.1
+protobuf==4.25.9
+psutil @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_psutil_1769678154/work
+ptyprocess==0.7.0
+pure_eval==0.2.3
+py-spy==0.4.1
+pycparser @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_pycparser_1733195786/work
+pyglet==2.1.13
+Pygments==2.20.0
+PyOpenGL==3.1.0
+pyparsing @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_pyparsing_1769003998/work
+PyQt5==5.15.11
+PyQt5_sip==12.17.0
+pyquaternion==0.9.9
+pyrender==0.1.45
+-e git+https://github.com/markusgrotz/PyRep.git@b8bd1d7a3182adcd570d001649c0849047ebf197#egg=PyRep
+PySocks @ file:///home/conda/feedstock_root/build_artifacts/pysocks_1733217236728/work
+pytest==9.0.2
+python-dateutil @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_python-dateutil_1751104122/work
+python-dotenv==1.2.2
+pytorch-lamb==1.0.0
+pytz @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_pytz_1773679724/work
+PyYAML @ file:///home/conda/feedstock_root/build_artifacts/pyyaml_1770223234623/work
+referencing==0.37.0
+regex==2024.5.15
+requests @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_requests_1774894783/work
+retrying==1.4.2
+# Editable install with no version control (reveal-vla-bimanual==0.1.0)
+-e /workspace/reveal_vla_bimanual
+rich==13.9.4
+rich-click==1.8.9
+-e git+https://github.com/markusgrotz/RLBench.git@8af748c51287989294e00c9c670e3330a0e35ed5#egg=rlbench
+rpds-py==0.30.0
+safetensors==0.4.3
+scikit-learn==1.7.2
+scipy @ file:///home/conda/feedstock_root/build_artifacts/scipy-split_1716470219380/work/dist/scipy-1.13.1-cp310-cp310-linux_x86_64.whl#sha256=a4ff22b6dc27b61196be51695f53f9b0676e7c1bc564872b51fc3c41b79ae80b
+segment-anything==1.0
+sentry-sdk==2.57.0
+setproctitle==1.3.7
+sip @ file:///home/conda/feedstock_root/build_artifacts/sip_1759437834046/work
+six @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_six_1753199211/work
+smmap==5.0.3
+stack-data==0.6.3
+sympy @ file:///home/conda/feedstock_root/build_artifacts/sympy_1771952240620/work
+tensorboard==2.16.2
+tensorboard-data-server==0.7.2
+tensorboardX==2.6.4
+termcolor==3.3.0
+threadpoolctl==3.6.0
+timeout-decorator==0.5.0
+timm==1.0.26
+tokenizers==0.19.1
+toml @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_toml_1764486833/work
+tomli @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_tomli_1774492402/work
+torch==2.3.1
+torchaudio==2.3.1
+torchvision==0.18.1
+tornado @ file:///home/conda/feedstock_root/build_artifacts/tornado_1774357896577/work
+tqdm @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_tqdm_1770153424/work
+traitlets==5.14.3
+transformers==4.41.2
+transforms3d==0.4.1
+trimesh @ file:///home/conda/feedstock_root/build_artifacts/trimesh_1774412449209/work
+triton==2.3.1
+typing_extensions @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_typing_extensions_1756220668/work
+tzdata @ file:///home/conda/feedstock_root/build_artifacts/python-tzdata_1765719872007/work
+unicodedata2 @ file:///home/conda/feedstock_root/build_artifacts/unicodedata2_1770908960326/work
+urllib3 @ file:///home/conda/feedstock_root/build_artifacts/urllib3_1767817748113/work
+wandb==0.14.0
+wcwidth==0.2.14
+Werkzeug==3.1.7
+widgetsnbextension==4.0.15
+yarl==1.23.0
+-e git+https://github.com/markusgrotz/YARR.git@6822ff78602c77878b27d4cfe759ce029c67bffb#egg=yarr
+zipp==3.23.0