lsnu commited on
Commit
aa584de
·
verified ·
1 Parent(s): af6f91c

Add files using upload-large-folder tool

Browse files
Files changed (50) hide show
  1. README.md +240 -0
  2. artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_base_fast_seed17/config_resolved.yaml +173 -0
  3. artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_base_fast_seed17/metrics.json +140 -0
  4. artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_base_fast_seed17/summary.json +0 -0
  5. artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_base_reuse128_seed17/config_resolved.yaml +174 -0
  6. artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_base_reuse128_seed17/metrics.json +278 -0
  7. artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_base_reuse128_seed17/summary.json +0 -0
  8. artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_rank_only_fast_seed17/config_resolved.yaml +170 -0
  9. artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_rank_only_fast_seed17/metrics.json +71 -0
  10. artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_rank_only_fast_seed17/summary.json +0 -0
  11. artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_rank_only_rebuild128_seed17/config_resolved.yaml +170 -0
  12. artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_rank_only_rebuild128_seed17/metrics.json +278 -0
  13. artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_rank_only_rebuild128_seed17/summary.json +0 -0
  14. artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_transition_fast_seed17/config_resolved.yaml +174 -0
  15. artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_transition_fast_seed17/metrics.json +140 -0
  16. artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_transition_fast_seed17/summary.json +0 -0
  17. artifacts/reports/anchor_dual_push_smoke_ep1/original_trunk/rollout_eval.json +280 -0
  18. artifacts/reports/anchor_dual_push_smoke_ep1/original_trunk/rollout_eval.md +14 -0
  19. artifacts/reports/anchor_dual_push_smoke_ep1/original_trunk/rollout_eval.partial.json +280 -0
  20. artifacts/reports/peract2_anchor_smoke_live/bimanual_push_box/command.txt +1 -0
  21. artifacts/reports/peract2_anchor_smoke_live/bimanual_push_box/stderr.txt +1 -0
  22. artifacts/reports/peract2_anchor_smoke_live/bimanual_push_box/stdout.txt +38 -0
  23. artifacts/reports/proxy_base_reuse128_smoke/scripted/reveal_benchmark.md +17 -0
  24. artifacts/reports/proxy_semantic_heuristic_quick12/active/reveal_benchmark.json +0 -0
  25. artifacts/reports/proxy_semantic_heuristic_quick12/active/reveal_benchmark.md +17 -0
  26. artifacts/reports/proxy_semantic_heuristic_quick12/candidate0/reveal_benchmark.md +17 -0
  27. artifacts/reports/proxy_semantic_heuristic_quick12/noop/reveal_benchmark.json +0 -0
  28. artifacts/reports/proxy_semantic_heuristic_quick12/noop/reveal_benchmark.md +17 -0
  29. artifacts/reports/proxy_semantic_heuristic_quick12/oracle/reveal_benchmark.json +0 -0
  30. artifacts/reports/proxy_semantic_heuristic_quick12/oracle/reveal_benchmark.md +17 -0
  31. artifacts/reports/proxy_semantic_nowm_quick12_final_noop/reveal_benchmark.md +17 -0
  32. artifacts/reports/repaired_dual_push_chunk8_ep1_len25/rollout_eval.md +14 -0
  33. artifacts/reports/repaired_dual_push_chunk8_ep1_len25/rollout_eval.partial.json +280 -0
  34. artifacts/reports/repaired_dual_push_chunk8_ep3/rollout_eval.json +29 -0
  35. artifacts/reports/repaired_dual_push_chunk8_ep3/rollout_eval.md +14 -0
  36. artifacts/reports/repaired_dual_push_chunk8_ep3/rollout_eval.partial.json +29 -0
  37. docs/CHANGE_AND_TEST_LOG.md +221 -0
  38. docs/MODEL_AND_ARTIFACT_INDEX.md +59 -0
  39. docs/RESULTS_RAW.md +178 -0
  40. docs/VLAarchtests2_code_README.md +301 -0
  41. docs/elastic_occlusion_handoff_completion_2026-03-31.md +184 -0
  42. docs/elastic_occlusion_iteration_2026-03-31.md +232 -0
  43. docs/elastic_occlusion_repo_audit_2026-03-31.md +400 -0
  44. docs/instructions.md +1030 -0
  45. legacy/general_task_anchor_20260330_dual_push_buttons/summary.json +37 -0
  46. setup/ENVIRONMENT.md +55 -0
  47. setup/bootstrap_same_hardware.sh +42 -0
  48. setup/env_vars.sh +15 -0
  49. setup/requirements_core.txt +22 -0
  50. setup/rlbench_pip_freeze.txt +181 -0
README.md ADDED
@@ -0,0 +1,240 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # VLAarchtests3
2
+
3
+ `VLAarchtests3` is the organized export of the elastic-occlusion bimanual VLA handoff completed on a 1x L40S RunPod machine.
4
+
5
+ It is a successor snapshot to the earlier `VLAarchtests` and `VLAarchtests2` work:
6
+
7
+ - `VLAarchtests`: earlier architecture-search and benchmark-debugging work.
8
+ - `VLAarchtests2`: larger exploratory branch with frequent model changes, mixed benchmark artifacts, and several legacy results that needed manual reinterpretation.
9
+ - `VLAarchtests3`: cleaned export focused on the final handoff state, the adapter refactor, the validated tests, the current checkpoints, and the reports needed to continue from here.
10
+
11
+ ## What Was Done
12
+
13
+ The main engineering outcome was a refactor from a monolithic elastic policy into a cleaner `trunk + structured adapter + no-op fallback` stack.
14
+
15
+ The final exported code contains:
16
+
17
+ - a clean wrapped-policy interface with `trunk_only`, `adapter_noop`, and `adapter_active` modes,
18
+ - a structured elastic-occlusion adapter with:
19
+ - reveal-state prediction,
20
+ - task-routed reveal/retrieve proposal families,
21
+ - retrieve-feasibility gating,
22
+ - a lightweight reveal-state transition model,
23
+ - explicit tests that protect:
24
+ - no-op equivalence,
25
+ - generic-task fallback,
26
+ - benchmark protocol identity,
27
+ - unsafe retrieve blocking,
28
+ - cloth-specific selection behavior.
29
+
30
+ The most important debugging pass was in the planner/gating logic. The original active path could reveal forever or retrieve too early. The final planner fixes made it:
31
+
32
+ - summarize scene readiness at the scene level rather than worst-candidate level,
33
+ - hard-mask unsafe retrieve candidates,
34
+ - switch from reveal to retrieve once feasibility is met,
35
+ - use task-specific bag and cloth readiness criteria,
36
+ - prefer reveal macros early and retrieve later.
37
+
38
+ ## What Was Actually Evaluated
39
+
40
+ Two different kinds of evidence are included.
41
+
42
+ ### 1. Trusted General-Task Anchor
43
+
44
+ This was kept narrow on purpose because only `dual_push_buttons` was trusted on this setup.
45
+
46
+ Trusted anchor evidence:
47
+
48
+ - official AnyBimanual local anchor summary on `dual_push_buttons`:
49
+ - `25` episodes
50
+ - success `0.96`
51
+ - live rerun on this RunPod:
52
+ - `5` episodes
53
+ - scores `[0, 100, 100, 0, 0]`
54
+ - mean score `40.0`
55
+
56
+ Interpretation:
57
+
58
+ - the official trunk path is real and non-trivial on the one stable anchor task,
59
+ - this does **not** mean the local custom CLIP trunk was competitive broadly,
60
+ - this does **not** validate the other unstable RLBench target-like tasks.
61
+
62
+ ### 2. Reveal/Retrieve Proxy Benchmark
63
+
64
+ This benchmark is useful for mechanism debugging, but it is **not** a real robot/physics benchmark.
65
+
66
+ The final reported held-out smoke benchmark used:
67
+
68
+ - `12` foliage episodes,
69
+ - `12` bag episodes,
70
+ - `12` cloth episodes,
71
+ - `36` total episodes,
72
+ - separate held-out procedural seeds from the adapter train/val splits.
73
+
74
+ Results:
75
+
76
+ - non-intervention / matched no-op:
77
+ - mean success `0.000`
78
+ - foliage `0.000`
79
+ - bag `0.000`
80
+ - cloth `0.000`
81
+ - visibility integral `2.275`
82
+ - corridor availability `0.0312`
83
+ - disturbance cost `0.7433`
84
+
85
+ - intervention / adapter active:
86
+ - mean success `0.6667`
87
+ - foliage `0.6667`
88
+ - bag `0.7500`
89
+ - cloth `0.5833`
90
+ - visibility integral `19.9503`
91
+ - corridor availability `0.7974`
92
+ - disturbance cost `0.2835`
93
+ - reocclusion rate `0.00278`
94
+ - planner regret `0.1586`
95
+
96
+ The active policy did really intervene on these tasks. It did not just fall back silently to the trunk:
97
+
98
+ - all recorded selections on the final held-out smoke run were non-base candidates,
99
+ - typical successful pattern:
100
+ - foliage: reveal (`pin_canopy`) then `retrieve`,
101
+ - bag: reveal (`widen_mouth`) then `retrieve`,
102
+ - cloth: reveal (`separate_layer`) then `retrieve`.
103
+
104
+ ## Important Limitation
105
+
106
+ The reveal/retrieve proxy is a procedural synthetic environment, not a contact-rich robot simulator.
107
+
108
+ It has:
109
+
110
+ - synthetic RGB-D renders,
111
+ - internal latent state,
112
+ - hand-coded transition rules,
113
+ - scripted teacher/oracle supervision.
114
+
115
+ It does **not** have:
116
+
117
+ - rigid-body or deformable physics,
118
+ - actual robot kinematics,
119
+ - true contact/grasp simulation,
120
+ - a fair end-to-end manipulation distribution for a pretrained trunk.
121
+
122
+ Therefore:
123
+
124
+ - the proxy result is useful to validate adapter logic,
125
+ - the proxy result is **not** sufficient evidence that the trunk or the full system would outperform real baselines on RLBench or on the future custom benchmark.
126
+
127
+ ## What Was Learned
128
+
129
+ The work supports the following conclusions:
130
+
131
+ - the structured adapter idea is still alive,
132
+ - the explicit reveal-state variables are worth keeping,
133
+ - task-routed reveal macros matter,
134
+ - retrieve-feasibility gating matters,
135
+ - the no-op fallback path for general tasks is sound,
136
+ - the old heavy memory/world-model story is not where the strongest evidence lives.
137
+
138
+ The work does **not** yet justify:
139
+
140
+ - a claim of broad general-task superiority,
141
+ - a claim that the current proxy benchmark is a fair end-to-end benchmark,
142
+ - a claim that the architecture is validated on realistic target-like sim tasks.
143
+
144
+ ## Was The Adapter Trained?
145
+
146
+ Yes.
147
+
148
+ The final proxy adapter checkpoint was trained with:
149
+
150
+ - frozen trunk,
151
+ - adapter-only updates,
152
+ - trained components:
153
+ - reveal/state head,
154
+ - proposal prior,
155
+ - transition model,
156
+ - planner/reranker.
157
+
158
+ Proxy training data:
159
+
160
+ - train: `128` episodes per proxy family,
161
+ - val: `32` episodes per proxy family,
162
+ - proxy families:
163
+ - foliage,
164
+ - bag,
165
+ - cloth.
166
+
167
+ The final headline smoke benchmark was not run on those train/val episodes. It used separate held-out seeds.
168
+
169
+ ## Was This A Perfect Fairness Story?
170
+
171
+ No.
172
+
173
+ What is fair in the current export:
174
+
175
+ - matched active vs no-op comparisons on the same wrapped checkpoint,
176
+ - held-out procedural seeds for the final proxy benchmark,
177
+ - exact no-op and generic-task fallback tests.
178
+
179
+ What is still missing for a stronger paper-quality comparison:
180
+
181
+ 1. same-initialization `trunk_only` fine-tuned on the same proxy data,
182
+ 2. same-initialization `trunk + adapter` fine-tuned on the same proxy data,
183
+ 3. comparison on held-out proxy seeds,
184
+ 4. comparison on stable real-sim tasks.
185
+
186
+ ## What Is Left To Do
187
+
188
+ The main remaining work is on real sim benchmarks, not more abstract proxy optimization.
189
+
190
+ Priority list:
191
+
192
+ 1. Train a fair control:
193
+ - same initialization,
194
+ - `trunk_only` fine-tuned on the same reveal/retrieve proxy data,
195
+ - compare against `trunk + adapter`.
196
+
197
+ 2. Attach the adapter directly to a strong public trunk:
198
+ - official AnyBimanual,
199
+ - official PerAct2 / RVT,
200
+ - or 3D FlowMatch Actor if practical.
201
+
202
+ 3. Validate on stable real-sim tasks:
203
+ - do not trust unstable RLBench tasks with infeasible waypoints,
204
+ - rebuild a trustworthy target-like evaluation subset,
205
+ - keep `dual_push_buttons` as a regression anchor only.
206
+
207
+ 4. Add a deformable / garment benchmark:
208
+ - this is the most relevant public step toward the future suitcase/clothes benchmark.
209
+
210
+ 5. Only after that:
211
+ - revisit larger RLBench sweeps,
212
+ - or collect custom teleop data.
213
+
214
+ ## Repository Layout
215
+
216
+ - `code/`
217
+ - cleaned code snapshot used for the handoff
218
+ - `artifacts/outputs/`
219
+ - current adapter checkpoints and training outputs
220
+ - `artifacts/reports/`
221
+ - evaluation and debugging reports
222
+ - `artifacts/data/reveal_proxy/`
223
+ - proxy train/val datasets used by this stage
224
+ - `legacy/`
225
+ - exact older checkpoints and summaries that the current work depends on
226
+ - `docs/`
227
+ - audit, iteration, and completion reports from this handoff
228
+ - `setup/`
229
+ - same-machine environment notes and helper scripts
230
+
231
+ ## Recommended Use Of This Repo
232
+
233
+ Use this repo as:
234
+
235
+ - the archival handoff state,
236
+ - the codebase to continue adapter work from,
237
+ - the source of the current checkpoints and benchmark reports,
238
+ - the baseline package before moving to real sim validation.
239
+
240
+ Do **not** use it as evidence that the architecture is already validated on realistic manipulation benchmarks. That validation is what should happen next.
artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_base_fast_seed17/config_resolved.yaml ADDED
@@ -0,0 +1,173 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ experiment_name: proxy_adapter_wrapped_clip_base_fast_seed17
2
+ output_dir: /workspace/workspace/outputs/adapter_proxy
3
+ device: cuda
4
+ seed: 17
5
+ init_checkpoint: /workspace/workspace/VLAarchtests2/VLAarchtests/artifacts/outputs/r3d_handoff_phase/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_seed17/checkpoint_best.pt
6
+ init_strict: false
7
+ data:
8
+ proxies:
9
+ - foliage_proxy
10
+ - bag_proxy
11
+ - cloth_proxy
12
+ resolution: 224
13
+ dataset_version: reveal_proxy_v6_rgbd_elastic_state_phase_fast
14
+ train_episodes_per_proxy: 12
15
+ val_episodes_per_proxy: 4
16
+ train_dataset_path: /workspace/workspace/data/reveal_proxy/proxy_train_clip224_v6_rgbd_stage3_phase_fast.pt
17
+ val_dataset_path: /workspace/workspace/data/reveal_proxy/proxy_val_clip224_v6_rgbd_stage3_phase_fast.pt
18
+ rebuild_dataset: false
19
+ chunk_horizon: 8
20
+ rollout_horizon: 5
21
+ history_steps: 6
22
+ planner_candidates: 8
23
+ seed: 17
24
+ optim:
25
+ epochs: 2
26
+ batch_size: 4
27
+ num_workers: 8
28
+ lr: 0.0001
29
+ weight_decay: 0.0001
30
+ trainer:
31
+ policy_type: adapter_wrapped
32
+ training_regime: adapter_train_frozen_trunk
33
+ eval_mode: adapter_active
34
+ adapter_mode: adapter_active
35
+ adapter_use_transition_model: false
36
+ adapter_use_task_conditioning: true
37
+ use_bf16: true
38
+ grad_clip_norm: 1.0
39
+ freeze_backbone: true
40
+ gradient_checkpointing: false
41
+ plan_during_train: false
42
+ plan_during_eval: false
43
+ support_mode_conditioning: true
44
+ planner_mode: false
45
+ use_depth: true
46
+ use_world_model: false
47
+ use_role_tokens: true
48
+ compute_equivariance_probe: false
49
+ trainable_parameter_prefixes:
50
+ - adapter.state_head
51
+ - adapter.proposal_prior
52
+ - adapter.planner
53
+ policy:
54
+ backbone:
55
+ model_name: openai/clip-vit-base-patch32
56
+ hidden_dim: 512
57
+ max_text_tokens: 32
58
+ freeze_backbone: true
59
+ gradient_checkpointing: false
60
+ use_dummy_backbone: false
61
+ fusion:
62
+ hidden_dim: 512
63
+ num_cameras: 3
64
+ num_layers: 4
65
+ num_heads: 8
66
+ ff_dim: 2048
67
+ dropout: 0.1
68
+ proprio_dim: 32
69
+ proprio_tokens: 1
70
+ memory:
71
+ hidden_dim: 512
72
+ action_dim: 14
73
+ history_steps: 6
74
+ scene_history_steps: 3
75
+ belief_history_steps: 8
76
+ num_layers: 2
77
+ dropout: 0.1
78
+ memory_bank_size: 4
79
+ scene_bank_size: 2
80
+ belief_bank_size: 2
81
+ num_heads: 8
82
+ max_history_steps: 8
83
+ reveal_cache_steps: 4
84
+ reveal_cache_decay: 0.7
85
+ decoder:
86
+ hidden_dim: 512
87
+ num_heads: 8
88
+ num_layers: 4
89
+ ff_dim: 2048
90
+ dropout: 0.1
91
+ chunk_size: 8
92
+ action_dim: 14
93
+ arm_action_dim: 7
94
+ num_candidates: 8
95
+ num_phases: 5
96
+ num_arm_roles: 4
97
+ num_proposal_modes: 7
98
+ planner_top_k: 4
99
+ proposal_delta_scale: 0.2
100
+ proposal_slot_scale: 0.05
101
+ reveal_head:
102
+ hidden_dim: 512
103
+ num_support_modes: 3
104
+ num_approach_templates: 32
105
+ rollout_horizon: 5
106
+ belief_map_size: 32
107
+ field_size: 16
108
+ num_heads: 8
109
+ predict_belief_map: true
110
+ num_phases: 5
111
+ num_arm_roles: 4
112
+ num_interaction_tokens: 8
113
+ num_tasks: 4
114
+ world_model:
115
+ hidden_dim: 512
116
+ action_dim: 14
117
+ num_support_modes: 3
118
+ num_approach_templates: 32
119
+ rollout_horizon: 5
120
+ field_size: 16
121
+ num_heads: 8
122
+ num_phases: 5
123
+ num_arm_roles: 4
124
+ num_interaction_tokens: 8
125
+ belief_map_size: 32
126
+ predict_belief_map: true
127
+ scene_bank_size: 2
128
+ belief_bank_size: 2
129
+ rollout_mode: compact_rollout
130
+ num_tasks: 4
131
+ lightweight_field_size: 4
132
+ planner:
133
+ hidden_dim: 512
134
+ num_candidates: 8
135
+ action_dim: 14
136
+ num_support_modes: 3
137
+ utility_margin: 0.1
138
+ num_heads: 8
139
+ num_layers: 2
140
+ num_phases: 5
141
+ num_arm_roles: 4
142
+ top_k: 4
143
+ adapter_confidence_threshold: 0.45
144
+ loss_weights:
145
+ action: 1.0
146
+ phase: 0.08
147
+ arm_role: 0.08
148
+ support_mode: 0.08
149
+ corridor: 0.12
150
+ persistence: 0.06
151
+ disturbance: 0.06
152
+ world_model: 0.0
153
+ transition: 0.0
154
+ belief: 0.05
155
+ visibility: 0.05
156
+ clearance: 0.06
157
+ support_stability: 0.06
158
+ reocclusion: 0.06
159
+ occluder_contact: 0.05
160
+ grasp_affordance: 0.05
161
+ planner_success: 0.15
162
+ planner_risk: 0.08
163
+ planner_ranking: 0.15
164
+ proposal_reconstruction: 0.08
165
+ proposal_success: 0.1
166
+ proposal_ranking: 0.12
167
+ proposal_mode: 0.08
168
+ proposal_diversity: 0.05
169
+ role_swap_consistency: 0.0
170
+ task_metrics: 0.06
171
+ gate: 0.05
172
+ distillation: 0.05
173
+ calibration: 0.02
artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_base_fast_seed17/metrics.json ADDED
@@ -0,0 +1,140 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "epoch": 0,
4
+ "train": {
5
+ "action": 1.1780137238295183,
6
+ "arm_role": 0.000544056080402895,
7
+ "belief": 0.10274084074341733,
8
+ "calibration": 0.0,
9
+ "clearance": 0.08112246429790622,
10
+ "corridor": 0.21243907782532598,
11
+ "distillation": 0.0036539296447501883,
12
+ "disturbance": 0.0010930091615908009,
13
+ "gate": 0.0,
14
+ "grasp_affordance": 0.011060374242294094,
15
+ "occluder_contact": 0.19354943348013837,
16
+ "persistence": 0.29602919886415097,
17
+ "phase": 0.1456924275211666,
18
+ "planner_ranking": 1.1046701566032742,
19
+ "planner_risk": 0.03252584584381269,
20
+ "planner_success": 0.5002943964108176,
21
+ "proposal_diversity": 0.0,
22
+ "proposal_mode": 0.9053098727827487,
23
+ "proposal_ranking": 0.7633599224297897,
24
+ "proposal_reconstruction": 1.1813416908616605,
25
+ "proposal_success": 0.5018493273983831,
26
+ "reocclusion": 0.1370238650428212,
27
+ "role_swap_consistency": 0.0,
28
+ "support_mode": 0.0010332910049170175,
29
+ "support_stability": 0.13264792088581168,
30
+ "task_metrics": 0.07693366929078879,
31
+ "total": 1.8312026676924333,
32
+ "transition": 0.0,
33
+ "uncertainty": 1.4312560102039045e-05,
34
+ "visibility": 0.096126823645571,
35
+ "world_model": 0.0
36
+ },
37
+ "val": {
38
+ "action": 1.146972581744194,
39
+ "arm_role": 2.7849786739864157e-05,
40
+ "belief": 0.09928969945758581,
41
+ "calibration": 0.0,
42
+ "clearance": 0.07546275667846203,
43
+ "corridor": 0.18693614657968283,
44
+ "distillation": 0.005982774979202077,
45
+ "disturbance": 0.0012652746545427362,
46
+ "gate": 0.0,
47
+ "grasp_affordance": 0.009092151012737304,
48
+ "occluder_contact": 0.19199086539447308,
49
+ "persistence": 0.4173499735770747,
50
+ "phase": 0.20510842488147318,
51
+ "planner_ranking": 1.0746948570013046,
52
+ "planner_risk": 0.03205434698611498,
53
+ "planner_success": 0.3765582703053951,
54
+ "proposal_diversity": 0.0,
55
+ "proposal_mode": 0.5553285405039787,
56
+ "proposal_ranking": 0.6613346468657255,
57
+ "proposal_reconstruction": 1.1140409670770168,
58
+ "proposal_success": 0.32496484369039536,
59
+ "reocclusion": 0.2021030569449067,
60
+ "role_swap_consistency": 0.0,
61
+ "support_mode": 0.00011286496555840131,
62
+ "support_stability": 0.13265474420040846,
63
+ "task_metrics": 0.06524855340830982,
64
+ "total": 1.7250810116529465,
65
+ "transition": 0.0,
66
+ "uncertainty": 8.913456255754681e-06,
67
+ "visibility": 0.09269411116838455,
68
+ "world_model": 0.0
69
+ }
70
+ },
71
+ {
72
+ "epoch": 1,
73
+ "train": {
74
+ "action": 1.1840074995289678,
75
+ "arm_role": 1.7842088946844857e-05,
76
+ "belief": 0.10108890773161598,
77
+ "calibration": 0.0,
78
+ "clearance": 0.08066983359015506,
79
+ "corridor": 0.20431885726587928,
80
+ "distillation": 0.005328163808292668,
81
+ "disturbance": 0.000988402207440231,
82
+ "gate": 0.0,
83
+ "grasp_affordance": 0.010460576832132496,
84
+ "occluder_contact": 0.19120351322319196,
85
+ "persistence": 0.20984708754669712,
86
+ "phase": 0.1270662468412648,
87
+ "planner_ranking": 1.051699793857077,
88
+ "planner_risk": 0.03183994928131933,
89
+ "planner_success": 0.37528212303700653,
90
+ "proposal_diversity": 0.0,
91
+ "proposal_mode": 0.541168266016504,
92
+ "proposal_ranking": 0.7413897125617318,
93
+ "proposal_reconstruction": 1.1529877976230953,
94
+ "proposal_success": 0.273181245378826,
95
+ "reocclusion": 0.11955958685797194,
96
+ "role_swap_consistency": 0.0,
97
+ "support_mode": 0.00014792317929475203,
98
+ "support_stability": 0.1314481108084969,
99
+ "task_metrics": 0.07543641668946846,
100
+ "total": 1.744326695151951,
101
+ "transition": 0.0,
102
+ "uncertainty": 7.94198708297739e-06,
103
+ "visibility": 0.09458825672450273,
104
+ "world_model": 0.0
105
+ },
106
+ "val": {
107
+ "action": 1.1787440478801727,
108
+ "arm_role": 1.3783465302452669e-05,
109
+ "belief": 0.0974554605782032,
110
+ "calibration": 0.0,
111
+ "clearance": 0.0746708307415247,
112
+ "corridor": 0.18591812625527382,
113
+ "distillation": 0.0038922334788367152,
114
+ "disturbance": 0.0005819438138132682,
115
+ "gate": 0.0,
116
+ "grasp_affordance": 0.008575586834922433,
117
+ "occluder_contact": 0.19005733728408813,
118
+ "persistence": 0.4048172008187976,
119
+ "phase": 0.24421580568014178,
120
+ "planner_ranking": 1.0271672308444977,
121
+ "planner_risk": 0.03108011605218053,
122
+ "planner_success": 0.3713325075805187,
123
+ "proposal_diversity": 0.0,
124
+ "proposal_mode": 0.46797188371419907,
125
+ "proposal_ranking": 0.6800601556897163,
126
+ "proposal_reconstruction": 1.0902876928448677,
127
+ "proposal_success": 0.25984624214470387,
128
+ "reocclusion": 0.19258547481149435,
129
+ "role_swap_consistency": 0.0,
130
+ "support_mode": 0.00014510085156871355,
131
+ "support_stability": 0.13228781055659056,
132
+ "task_metrics": 0.06339579145424068,
133
+ "total": 1.7367750853300095,
134
+ "transition": 0.0,
135
+ "uncertainty": 6.649694360483238e-06,
136
+ "visibility": 0.09114759508520365,
137
+ "world_model": 0.0
138
+ }
139
+ }
140
+ ]
artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_base_fast_seed17/summary.json ADDED
The diff for this file is too large to render. See raw diff
 
artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_base_reuse128_seed17/config_resolved.yaml ADDED
@@ -0,0 +1,174 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ experiment_name: proxy_adapter_wrapped_clip_base_reuse128_seed17
2
+ output_dir: /workspace/workspace/outputs/adapter_proxy
3
+ device: cuda
4
+ seed: 17
5
+ init_checkpoint: /workspace/workspace/VLAarchtests2/VLAarchtests/artifacts/outputs/r3d_handoff_phase/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_seed17/checkpoint_best.pt
6
+ init_strict: false
7
+ data:
8
+ proxies:
9
+ - foliage_proxy
10
+ - bag_proxy
11
+ - cloth_proxy
12
+ resolution: 224
13
+ dataset_version: reveal_proxy_v6_rgbd_elastic_state_phase
14
+ train_episodes_per_proxy: 128
15
+ val_episodes_per_proxy: 32
16
+ train_dataset_path: /workspace/workspace/data/reveal_proxy/proxy_train_clip224_v6_rgbd_stage3_phase_rebuild128_seed17.pt
17
+ val_dataset_path: /workspace/workspace/data/reveal_proxy/proxy_val_clip224_v6_rgbd_stage3_phase_rebuild128_seed17.pt
18
+ rebuild_dataset: false
19
+ chunk_horizon: 8
20
+ rollout_horizon: 5
21
+ history_steps: 6
22
+ planner_candidates: 8
23
+ seed: 17
24
+ optim:
25
+ epochs: 4
26
+ batch_size: 8
27
+ num_workers: 32
28
+ lr: 0.0001
29
+ weight_decay: 0.0001
30
+ trainer:
31
+ policy_type: adapter_wrapped
32
+ training_regime: adapter_train_frozen_trunk
33
+ eval_mode: adapter_active
34
+ adapter_mode: adapter_active
35
+ adapter_use_transition_model: true
36
+ adapter_use_task_conditioning: true
37
+ use_bf16: true
38
+ grad_clip_norm: 1.0
39
+ freeze_backbone: true
40
+ gradient_checkpointing: false
41
+ plan_during_train: false
42
+ plan_during_eval: false
43
+ support_mode_conditioning: true
44
+ planner_mode: false
45
+ use_depth: true
46
+ use_world_model: false
47
+ use_role_tokens: true
48
+ compute_equivariance_probe: false
49
+ trainable_parameter_prefixes:
50
+ - adapter.state_head
51
+ - adapter.proposal_prior
52
+ - adapter.transition_model
53
+ - adapter.planner
54
+ policy:
55
+ backbone:
56
+ model_name: openai/clip-vit-base-patch32
57
+ hidden_dim: 512
58
+ max_text_tokens: 32
59
+ freeze_backbone: true
60
+ gradient_checkpointing: false
61
+ use_dummy_backbone: false
62
+ fusion:
63
+ hidden_dim: 512
64
+ num_cameras: 3
65
+ num_layers: 4
66
+ num_heads: 8
67
+ ff_dim: 2048
68
+ dropout: 0.1
69
+ proprio_dim: 32
70
+ proprio_tokens: 1
71
+ memory:
72
+ hidden_dim: 512
73
+ action_dim: 14
74
+ history_steps: 6
75
+ scene_history_steps: 3
76
+ belief_history_steps: 8
77
+ num_layers: 2
78
+ dropout: 0.1
79
+ memory_bank_size: 4
80
+ scene_bank_size: 2
81
+ belief_bank_size: 2
82
+ num_heads: 8
83
+ max_history_steps: 8
84
+ reveal_cache_steps: 4
85
+ reveal_cache_decay: 0.7
86
+ decoder:
87
+ hidden_dim: 512
88
+ num_heads: 8
89
+ num_layers: 4
90
+ ff_dim: 2048
91
+ dropout: 0.1
92
+ chunk_size: 8
93
+ action_dim: 14
94
+ arm_action_dim: 7
95
+ num_candidates: 8
96
+ num_phases: 5
97
+ num_arm_roles: 4
98
+ num_proposal_modes: 7
99
+ planner_top_k: 4
100
+ proposal_delta_scale: 0.2
101
+ proposal_slot_scale: 0.05
102
+ reveal_head:
103
+ hidden_dim: 512
104
+ num_support_modes: 3
105
+ num_approach_templates: 32
106
+ rollout_horizon: 5
107
+ belief_map_size: 32
108
+ field_size: 16
109
+ num_heads: 8
110
+ predict_belief_map: true
111
+ num_phases: 5
112
+ num_arm_roles: 4
113
+ num_interaction_tokens: 8
114
+ num_tasks: 4
115
+ world_model:
116
+ hidden_dim: 512
117
+ action_dim: 14
118
+ num_support_modes: 3
119
+ num_approach_templates: 32
120
+ rollout_horizon: 5
121
+ field_size: 16
122
+ num_heads: 8
123
+ num_phases: 5
124
+ num_arm_roles: 4
125
+ num_interaction_tokens: 8
126
+ belief_map_size: 32
127
+ predict_belief_map: true
128
+ scene_bank_size: 2
129
+ belief_bank_size: 2
130
+ rollout_mode: compact_rollout
131
+ num_tasks: 4
132
+ lightweight_field_size: 4
133
+ planner:
134
+ hidden_dim: 512
135
+ num_candidates: 8
136
+ action_dim: 14
137
+ num_support_modes: 3
138
+ utility_margin: 0.1
139
+ num_heads: 8
140
+ num_layers: 2
141
+ num_phases: 5
142
+ num_arm_roles: 4
143
+ top_k: 4
144
+ adapter_confidence_threshold: 0.55
145
+ loss_weights:
146
+ action: 1.0
147
+ phase: 0.08
148
+ arm_role: 0.08
149
+ support_mode: 0.08
150
+ corridor: 0.12
151
+ persistence: 0.06
152
+ disturbance: 0.06
153
+ world_model: 0.0
154
+ transition: 0.2
155
+ belief: 0.05
156
+ visibility: 0.05
157
+ clearance: 0.06
158
+ support_stability: 0.06
159
+ reocclusion: 0.06
160
+ occluder_contact: 0.05
161
+ grasp_affordance: 0.05
162
+ planner_success: 0.15
163
+ planner_risk: 0.08
164
+ planner_ranking: 0.15
165
+ proposal_reconstruction: 0.08
166
+ proposal_success: 0.1
167
+ proposal_ranking: 0.12
168
+ proposal_mode: 0.08
169
+ proposal_diversity: 0.05
170
+ role_swap_consistency: 0.0
171
+ task_metrics: 0.06
172
+ gate: 0.05
173
+ distillation: 0.05
174
+ calibration: 0.02
artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_base_reuse128_seed17/metrics.json ADDED
@@ -0,0 +1,278 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "epoch": 0,
4
+ "train": {
5
+ "action": 1.1828932802216345,
6
+ "arm_role": 0.00244398444339226,
7
+ "belief": 0.10072019552232839,
8
+ "calibration": 0.0,
9
+ "clearance": 0.07946077994063121,
10
+ "corridor": 0.21543118382702356,
11
+ "distillation": 0.00042247207064432005,
12
+ "disturbance": 0.0009066167868626844,
13
+ "gate": 0.0,
14
+ "grasp_affordance": 0.011442071496031615,
15
+ "occluder_contact": 0.19184747789086415,
16
+ "persistence": 0.5456274578801724,
17
+ "phase": 0.1889389944928033,
18
+ "planner_ranking": 0.8968874569199666,
19
+ "planner_risk": 0.03290799349358603,
20
+ "planner_success": 0.35506935793311656,
21
+ "proposal_diversity": 0.0,
22
+ "proposal_mode": 0.7599493966383093,
23
+ "proposal_ranking": 1.4915186276956767,
24
+ "proposal_reconstruction": 1.0803285907296574,
25
+ "proposal_success": 0.3194384900461726,
26
+ "reocclusion": 0.1872198152817598,
27
+ "role_swap_consistency": 0.0,
28
+ "support_mode": 0.4244060135689102,
29
+ "support_stability": 0.13155287654459977,
30
+ "task_metrics": 0.07493724777292804,
31
+ "total": 2.751452175509028,
32
+ "transition": 4.318220460114359,
33
+ "uncertainty": 1.531094441807496e-05,
34
+ "visibility": 0.09642757938689545,
35
+ "world_model": 0.0
36
+ },
37
+ "val": {
38
+ "action": 1.1680383563041687,
39
+ "arm_role": 0.0025612511759391054,
40
+ "belief": 0.09879593178629875,
41
+ "calibration": 0.0,
42
+ "clearance": 0.07741740134855112,
43
+ "corridor": 0.20817755659421286,
44
+ "distillation": 0.0,
45
+ "disturbance": 0.0007382428300237128,
46
+ "gate": 0.0,
47
+ "grasp_affordance": 0.010511041525751353,
48
+ "occluder_contact": 0.19018630186716715,
49
+ "persistence": 0.4509886346757412,
50
+ "phase": 0.1597365932694326,
51
+ "planner_ranking": 0.22907628491520882,
52
+ "planner_risk": 0.02909238338470459,
53
+ "planner_success": 0.18200772007306418,
54
+ "proposal_diversity": 0.0,
55
+ "proposal_mode": 0.71118057568868,
56
+ "proposal_ranking": 1.4729209462801616,
57
+ "proposal_reconstruction": 1.015290528535843,
58
+ "proposal_success": 0.2791739940643311,
59
+ "reocclusion": 0.16477556849519412,
60
+ "role_swap_consistency": 0.0,
61
+ "support_mode": 0.5340653051932652,
62
+ "support_stability": 0.12872510105371476,
63
+ "task_metrics": 0.06174707182993491,
64
+ "total": 2.407643111546834,
65
+ "transition": 3.39704422156016,
66
+ "uncertainty": 7.099100287177862e-06,
67
+ "visibility": 0.09383414511879286,
68
+ "world_model": 0.0
69
+ }
70
+ },
71
+ {
72
+ "epoch": 1,
73
+ "train": {
74
+ "action": 1.187044749740793,
75
+ "arm_role": 0.001233981896833587,
76
+ "belief": 0.09885497215916128,
77
+ "calibration": 0.0,
78
+ "clearance": 0.07787450506281451,
79
+ "corridor": 0.21069503738349224,
80
+ "distillation": 0.0,
81
+ "disturbance": 0.0007993320816102586,
82
+ "gate": 0.0,
83
+ "grasp_affordance": 0.0100274878874922,
84
+ "occluder_contact": 0.19033558541486242,
85
+ "persistence": 0.508021433908148,
86
+ "phase": 0.19023076729739413,
87
+ "planner_ranking": 0.058458461105322636,
88
+ "planner_risk": 0.03440776518976488,
89
+ "planner_success": 0.1257152666627359,
90
+ "proposal_diversity": 0.0,
91
+ "proposal_mode": 0.7171601638072679,
92
+ "proposal_ranking": 1.499033512187605,
93
+ "proposal_reconstruction": 1.066634831809196,
94
+ "proposal_success": 0.3018947724534684,
95
+ "reocclusion": 0.16926059677821248,
96
+ "role_swap_consistency": 0.0,
97
+ "support_mode": 0.4455214215426886,
98
+ "support_stability": 0.13059799138362668,
99
+ "task_metrics": 0.07159904390573502,
100
+ "total": 2.4211200485710336,
101
+ "transition": 3.487839874099283,
102
+ "uncertainty": 3.770016950513401e-06,
103
+ "visibility": 0.09318254963189614,
104
+ "world_model": 0.0
105
+ },
106
+ "val": {
107
+ "action": 1.1680383563041687,
108
+ "arm_role": 0.001657356577925384,
109
+ "belief": 0.09766801769534747,
110
+ "calibration": 0.0,
111
+ "clearance": 0.07670599135259787,
112
+ "corridor": 0.20785387406746547,
113
+ "distillation": 0.0,
114
+ "disturbance": 0.0007254338066559285,
115
+ "gate": 0.0,
116
+ "grasp_affordance": 0.009808245363334816,
117
+ "occluder_contact": 0.18903621584177016,
118
+ "persistence": 0.43403610289096833,
119
+ "phase": 0.17749264603480697,
120
+ "planner_ranking": 0.00962653555907309,
121
+ "planner_risk": 0.02840747827043136,
122
+ "planner_success": 0.0469651294251283,
123
+ "proposal_diversity": 0.0,
124
+ "proposal_mode": 0.5958098510901133,
125
+ "proposal_ranking": 1.567319353421529,
126
+ "proposal_reconstruction": 1.0027365585168202,
127
+ "proposal_success": 0.3119396299123764,
128
+ "reocclusion": 0.14939573630690575,
129
+ "role_swap_consistency": 0.0,
130
+ "support_mode": 0.38477273682753244,
131
+ "support_stability": 0.12813995343943438,
132
+ "task_metrics": 0.05784295691798131,
133
+ "total": 2.3466440041859946,
134
+ "transition": 3.402106682459513,
135
+ "uncertainty": 3.2218885041383296e-06,
136
+ "visibility": 0.09148541142543157,
137
+ "world_model": 0.0
138
+ }
139
+ },
140
+ {
141
+ "epoch": 2,
142
+ "train": {
143
+ "action": 1.187824563819821,
144
+ "arm_role": 0.0017524876263963075,
145
+ "belief": 0.09850409833573494,
146
+ "calibration": 0.0,
147
+ "clearance": 0.07750590865602013,
148
+ "corridor": 0.21022135673576042,
149
+ "distillation": 0.0,
150
+ "disturbance": 0.0008020720826393432,
151
+ "gate": 0.0,
152
+ "grasp_affordance": 0.009951516582841883,
153
+ "occluder_contact": 0.190022504630209,
154
+ "persistence": 0.5073582559448331,
155
+ "phase": 0.17974354623339506,
156
+ "planner_ranking": 0.009596662447169549,
157
+ "planner_risk": 0.03246875642603185,
158
+ "planner_success": 0.06673186843698266,
159
+ "proposal_diversity": 0.0,
160
+ "proposal_mode": 0.7036348676481167,
161
+ "proposal_ranking": 1.4990194234527459,
162
+ "proposal_reconstruction": 1.0593123075340976,
163
+ "proposal_success": 0.30170050113141034,
164
+ "reocclusion": 0.1706294410807245,
165
+ "role_swap_consistency": 0.0,
166
+ "support_mode": 0.4435207678490326,
167
+ "support_stability": 0.12954452590030782,
168
+ "task_metrics": 0.07019141574679803,
169
+ "total": 2.3952997061384824,
170
+ "transition": 3.4510987426052573,
171
+ "uncertainty": 2.649417712834203e-06,
172
+ "visibility": 0.09213429119657068,
173
+ "world_model": 0.0
174
+ },
175
+ "val": {
176
+ "action": 1.1680383563041687,
177
+ "arm_role": 0.0005777989087315897,
178
+ "belief": 0.09620878870288531,
179
+ "calibration": 0.0,
180
+ "clearance": 0.07562205567955971,
181
+ "corridor": 0.2099471464753151,
182
+ "distillation": 0.0,
183
+ "disturbance": 0.0008037402614718304,
184
+ "gate": 0.0,
185
+ "grasp_affordance": 0.009381201630458236,
186
+ "occluder_contact": 0.18789172718922298,
187
+ "persistence": 0.44771519377827645,
188
+ "phase": 0.15351878677805264,
189
+ "planner_ranking": 0.005908836016897112,
190
+ "planner_risk": 0.029111843556165695,
191
+ "planner_success": 0.030371779979517063,
192
+ "proposal_diversity": 0.0,
193
+ "proposal_mode": 0.6608088513215383,
194
+ "proposal_ranking": 1.519856317838033,
195
+ "proposal_reconstruction": 0.9984971513350804,
196
+ "proposal_success": 0.2899133563041687,
197
+ "reocclusion": 0.15338999405503273,
198
+ "role_swap_consistency": 0.0,
199
+ "support_mode": 0.4591325432062149,
200
+ "support_stability": 0.12738436510165532,
201
+ "task_metrics": 0.05577167191853126,
202
+ "total": 2.3411471287409467,
203
+ "transition": 3.3808055957158407,
204
+ "uncertainty": 1.560352771671584e-06,
205
+ "visibility": 0.08981477295358976,
206
+ "world_model": 0.0
207
+ }
208
+ },
209
+ {
210
+ "epoch": 3,
211
+ "train": {
212
+ "action": 1.1873075451169695,
213
+ "arm_role": 0.0010167556069400005,
214
+ "belief": 0.09699463875604276,
215
+ "calibration": 0.0,
216
+ "clearance": 0.0765939431280649,
217
+ "corridor": 0.21000426350271,
218
+ "distillation": 0.0,
219
+ "disturbance": 0.0008205439020564561,
220
+ "gate": 0.0,
221
+ "grasp_affordance": 0.009616962144886996,
222
+ "occluder_contact": 0.1890684860844572,
223
+ "persistence": 0.5268036977802756,
224
+ "phase": 0.18212753434141143,
225
+ "planner_ranking": 0.007861482998857102,
226
+ "planner_risk": 0.0305439497837249,
227
+ "planner_success": 0.0545816100632944,
228
+ "proposal_diversity": 0.0,
229
+ "proposal_mode": 0.7096028443144149,
230
+ "proposal_ranking": 1.49962230790563,
231
+ "proposal_reconstruction": 1.0570516235688154,
232
+ "proposal_success": 0.3012468101096754,
233
+ "reocclusion": 0.16893144916085637,
234
+ "role_swap_consistency": 0.0,
235
+ "support_mode": 0.43846767214166016,
236
+ "support_stability": 0.12901192851865492,
237
+ "task_metrics": 0.0706772211500827,
238
+ "total": 2.383075835324135,
239
+ "transition": 3.399705786664947,
240
+ "uncertainty": 1.833678168140796e-06,
241
+ "visibility": 0.09043271063255663,
242
+ "world_model": 0.0
243
+ },
244
+ "val": {
245
+ "action": 1.1680383563041687,
246
+ "arm_role": 0.0008160963848543664,
247
+ "belief": 0.09533951580524444,
248
+ "calibration": 0.0,
249
+ "clearance": 0.07521944617231686,
250
+ "corridor": 0.2074363355835279,
251
+ "distillation": 0.0,
252
+ "disturbance": 0.0007471947777958121,
253
+ "gate": 0.0,
254
+ "grasp_affordance": 0.009425108910848697,
255
+ "occluder_contact": 0.187281297147274,
256
+ "persistence": 0.42866156020512186,
257
+ "phase": 0.13389708844115375,
258
+ "planner_ranking": 0.007386005097456897,
259
+ "planner_risk": 0.03013829297075669,
260
+ "planner_success": 0.027494619445254404,
261
+ "proposal_diversity": 0.0,
262
+ "proposal_mode": 0.7145659645398458,
263
+ "proposal_ranking": 1.4651208639144897,
264
+ "proposal_reconstruction": 0.99560972849528,
265
+ "proposal_success": 0.29622272253036497,
266
+ "reocclusion": 0.15021706620852152,
267
+ "role_swap_consistency": 0.0,
268
+ "support_mode": 0.3665752013524373,
269
+ "support_stability": 0.12691180408000946,
270
+ "task_metrics": 0.056707360843817396,
271
+ "total": 2.3298022985458373,
272
+ "transition": 3.3876041332880655,
273
+ "uncertainty": 1.581879031557302e-06,
274
+ "visibility": 0.08887151132027309,
275
+ "world_model": 0.0
276
+ }
277
+ }
278
+ ]
artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_base_reuse128_seed17/summary.json ADDED
The diff for this file is too large to render. See raw diff
 
artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_rank_only_fast_seed17/config_resolved.yaml ADDED
@@ -0,0 +1,170 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ experiment_name: proxy_adapter_wrapped_clip_rank_only_fast_seed17
2
+ output_dir: /workspace/workspace/outputs/adapter_proxy
3
+ device: cuda
4
+ seed: 17
5
+ init_checkpoint: /workspace/workspace/VLAarchtests2/VLAarchtests/artifacts/outputs/r3d_handoff_phase/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_seed17/checkpoint_best.pt
6
+ init_strict: false
7
+ data:
8
+ proxies:
9
+ - foliage_proxy
10
+ - bag_proxy
11
+ - cloth_proxy
12
+ resolution: 224
13
+ dataset_version: reveal_proxy_v6_rgbd_elastic_state_phase_fast
14
+ train_episodes_per_proxy: 12
15
+ val_episodes_per_proxy: 4
16
+ train_dataset_path: /workspace/workspace/data/reveal_proxy/proxy_train_clip224_v6_rgbd_stage3_phase_fast.pt
17
+ val_dataset_path: /workspace/workspace/data/reveal_proxy/proxy_val_clip224_v6_rgbd_stage3_phase_fast.pt
18
+ rebuild_dataset: false
19
+ chunk_horizon: 8
20
+ rollout_horizon: 5
21
+ history_steps: 6
22
+ planner_candidates: 8
23
+ seed: 17
24
+ optim:
25
+ epochs: 1
26
+ batch_size: 4
27
+ num_workers: 8
28
+ lr: 5.0e-05
29
+ weight_decay: 0.0001
30
+ trainer:
31
+ policy_type: adapter_wrapped
32
+ training_regime: proxy_rank_only
33
+ eval_mode: adapter_active
34
+ adapter_mode: adapter_active
35
+ adapter_use_transition_model: false
36
+ adapter_use_task_conditioning: true
37
+ use_bf16: true
38
+ grad_clip_norm: 1.0
39
+ freeze_backbone: true
40
+ gradient_checkpointing: false
41
+ plan_during_train: false
42
+ plan_during_eval: false
43
+ support_mode_conditioning: true
44
+ planner_mode: false
45
+ use_depth: true
46
+ use_world_model: false
47
+ use_role_tokens: true
48
+ compute_equivariance_probe: false
49
+ trainable_parameter_prefixes:
50
+ - adapter.proposal_prior
51
+ - adapter.planner
52
+ policy:
53
+ backbone:
54
+ model_name: openai/clip-vit-base-patch32
55
+ hidden_dim: 512
56
+ max_text_tokens: 32
57
+ freeze_backbone: true
58
+ gradient_checkpointing: false
59
+ use_dummy_backbone: false
60
+ fusion:
61
+ hidden_dim: 512
62
+ num_cameras: 3
63
+ num_layers: 4
64
+ num_heads: 8
65
+ ff_dim: 2048
66
+ dropout: 0.1
67
+ proprio_dim: 32
68
+ proprio_tokens: 1
69
+ memory:
70
+ hidden_dim: 512
71
+ action_dim: 14
72
+ history_steps: 6
73
+ scene_history_steps: 3
74
+ belief_history_steps: 8
75
+ num_layers: 2
76
+ dropout: 0.1
77
+ memory_bank_size: 4
78
+ scene_bank_size: 2
79
+ belief_bank_size: 2
80
+ num_heads: 8
81
+ max_history_steps: 8
82
+ decoder:
83
+ hidden_dim: 512
84
+ num_heads: 8
85
+ num_layers: 4
86
+ ff_dim: 2048
87
+ dropout: 0.1
88
+ chunk_size: 8
89
+ action_dim: 14
90
+ arm_action_dim: 7
91
+ num_candidates: 8
92
+ num_phases: 5
93
+ num_arm_roles: 4
94
+ num_proposal_modes: 7
95
+ planner_top_k: 4
96
+ proposal_delta_scale: 0.2
97
+ proposal_slot_scale: 0.05
98
+ reveal_head:
99
+ hidden_dim: 512
100
+ num_support_modes: 3
101
+ num_approach_templates: 32
102
+ rollout_horizon: 5
103
+ belief_map_size: 32
104
+ field_size: 16
105
+ num_heads: 8
106
+ predict_belief_map: true
107
+ num_phases: 5
108
+ num_arm_roles: 4
109
+ num_interaction_tokens: 8
110
+ num_tasks: 4
111
+ world_model:
112
+ hidden_dim: 512
113
+ action_dim: 14
114
+ num_support_modes: 3
115
+ num_approach_templates: 32
116
+ rollout_horizon: 5
117
+ field_size: 16
118
+ num_heads: 8
119
+ num_phases: 5
120
+ num_arm_roles: 4
121
+ num_interaction_tokens: 8
122
+ belief_map_size: 32
123
+ predict_belief_map: true
124
+ scene_bank_size: 2
125
+ belief_bank_size: 2
126
+ rollout_mode: compact_rollout
127
+ num_tasks: 4
128
+ lightweight_field_size: 4
129
+ planner:
130
+ hidden_dim: 512
131
+ num_candidates: 8
132
+ action_dim: 14
133
+ num_support_modes: 3
134
+ utility_margin: 0.1
135
+ num_heads: 8
136
+ num_layers: 2
137
+ num_phases: 5
138
+ num_arm_roles: 4
139
+ top_k: 4
140
+ adapter_confidence_threshold: 0.55
141
+ loss_weights:
142
+ action: 0.5
143
+ phase: 0.0
144
+ arm_role: 0.0
145
+ support_mode: 0.0
146
+ corridor: 0.0
147
+ persistence: 0.0
148
+ disturbance: 0.0
149
+ world_model: 0.0
150
+ transition: 0.0
151
+ belief: 0.0
152
+ visibility: 0.0
153
+ clearance: 0.0
154
+ support_stability: 0.0
155
+ reocclusion: 0.0
156
+ occluder_contact: 0.0
157
+ grasp_affordance: 0.0
158
+ planner_success: 0.0
159
+ planner_risk: 0.0
160
+ planner_ranking: 0.2
161
+ proposal_reconstruction: 0.0
162
+ proposal_success: 0.1
163
+ proposal_ranking: 0.2
164
+ proposal_mode: 0.1
165
+ proposal_diversity: 0.02
166
+ role_swap_consistency: 0.0
167
+ task_metrics: 0.0
168
+ gate: 0.0
169
+ distillation: 0.05
170
+ calibration: 0.0
artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_rank_only_fast_seed17/metrics.json ADDED
@@ -0,0 +1,71 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "epoch": 0,
4
+ "train": {
5
+ "action": 1.197754372721133,
6
+ "arm_role": 0.002544189276902572,
7
+ "belief": 0.10325424243574557,
8
+ "calibration": 0.0,
9
+ "clearance": 0.08140122955260069,
10
+ "corridor": 0.21582962238513256,
11
+ "distillation": 0.0017091589068750973,
12
+ "disturbance": 0.0018385711983959798,
13
+ "gate": 0.0,
14
+ "grasp_affordance": 0.012481509039745382,
15
+ "occluder_contact": 0.194344752508661,
16
+ "persistence": 0.7591703522130442,
17
+ "phase": 0.11467522253160892,
18
+ "planner_ranking": 1.1083470168321028,
19
+ "planner_risk": 0.03255904554996802,
20
+ "planner_success": 0.8582628343416296,
21
+ "proposal_diversity": 0.0,
22
+ "proposal_mode": 1.1811642983685369,
23
+ "proposal_ranking": 0.7893771244132001,
24
+ "proposal_reconstruction": 1.2029107290765513,
25
+ "proposal_success": 0.6142160711081132,
26
+ "reocclusion": 0.25430014456176886,
27
+ "role_swap_consistency": 0.0,
28
+ "support_mode": 0.004081056595010602,
29
+ "support_stability": 0.13368942070266474,
30
+ "task_metrics": 0.0832325461442056,
31
+ "total": 1.158045794652856,
32
+ "transition": 0.0,
33
+ "uncertainty": 2.6861929209980705e-05,
34
+ "visibility": 0.09703111033076825,
35
+ "world_model": 0.0
36
+ },
37
+ "val": {
38
+ "action": 1.140361487865448,
39
+ "arm_role": 0.0023584125883644447,
40
+ "belief": 0.10074711591005325,
41
+ "calibration": 0.0,
42
+ "clearance": 0.0765643808990717,
43
+ "corridor": 0.1961718276143074,
44
+ "distillation": 0.003883325931383297,
45
+ "disturbance": 0.0014785153052798705,
46
+ "gate": 0.0,
47
+ "grasp_affordance": 0.010992531199008226,
48
+ "occluder_contact": 0.1946533638983965,
49
+ "persistence": 0.5068328934721649,
50
+ "phase": 0.16515514547063503,
51
+ "planner_ranking": 1.0683312863111496,
52
+ "planner_risk": 0.03190935752354562,
53
+ "planner_success": 0.8590418174862862,
54
+ "proposal_diversity": 0.0,
55
+ "proposal_mode": 0.8785674348473549,
56
+ "proposal_ranking": 0.5385221149772406,
57
+ "proposal_reconstruction": 1.144310723990202,
58
+ "proposal_success": 0.5420645326375961,
59
+ "reocclusion": 0.21981605514883995,
60
+ "role_swap_consistency": 0.0,
61
+ "support_mode": 0.003642815601779148,
62
+ "support_stability": 0.1338925575837493,
63
+ "task_metrics": 0.06881333282217383,
64
+ "total": 1.0338090658187866,
65
+ "transition": 0.0,
66
+ "uncertainty": 2.9527319277633524e-05,
67
+ "visibility": 0.0945812463760376,
68
+ "world_model": 0.0
69
+ }
70
+ }
71
+ ]
artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_rank_only_fast_seed17/summary.json ADDED
The diff for this file is too large to render. See raw diff
 
artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_rank_only_rebuild128_seed17/config_resolved.yaml ADDED
@@ -0,0 +1,170 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ experiment_name: proxy_adapter_wrapped_clip_rank_only_rebuild128_seed17
2
+ output_dir: /workspace/workspace/outputs/adapter_proxy
3
+ device: cuda
4
+ seed: 17
5
+ init_checkpoint: /workspace/workspace/VLAarchtests2/VLAarchtests/artifacts/outputs/r3d_handoff_phase/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_seed17/checkpoint_best.pt
6
+ init_strict: false
7
+ data:
8
+ proxies:
9
+ - foliage_proxy
10
+ - bag_proxy
11
+ - cloth_proxy
12
+ resolution: 224
13
+ dataset_version: reveal_proxy_v6_rgbd_elastic_state_phase
14
+ train_episodes_per_proxy: 128
15
+ val_episodes_per_proxy: 32
16
+ train_dataset_path: /workspace/workspace/data/reveal_proxy/proxy_train_clip224_v6_rgbd_stage3_phase_rebuild128_seed17.pt
17
+ val_dataset_path: /workspace/workspace/data/reveal_proxy/proxy_val_clip224_v6_rgbd_stage3_phase_rebuild128_seed17.pt
18
+ rebuild_dataset: true
19
+ chunk_horizon: 8
20
+ rollout_horizon: 5
21
+ history_steps: 6
22
+ planner_candidates: 8
23
+ seed: 17
24
+ optim:
25
+ epochs: 4
26
+ batch_size: 8
27
+ num_workers: 32
28
+ lr: 5.0e-05
29
+ weight_decay: 0.0001
30
+ trainer:
31
+ policy_type: adapter_wrapped
32
+ training_regime: proxy_rank_only
33
+ eval_mode: adapter_active
34
+ adapter_mode: adapter_active
35
+ adapter_use_transition_model: false
36
+ adapter_use_task_conditioning: true
37
+ use_bf16: true
38
+ grad_clip_norm: 1.0
39
+ freeze_backbone: true
40
+ gradient_checkpointing: false
41
+ plan_during_train: false
42
+ plan_during_eval: false
43
+ support_mode_conditioning: true
44
+ planner_mode: false
45
+ use_depth: true
46
+ use_world_model: false
47
+ use_role_tokens: true
48
+ compute_equivariance_probe: false
49
+ trainable_parameter_prefixes:
50
+ - adapter.proposal_prior
51
+ - adapter.planner
52
+ policy:
53
+ backbone:
54
+ model_name: openai/clip-vit-base-patch32
55
+ hidden_dim: 512
56
+ max_text_tokens: 32
57
+ freeze_backbone: true
58
+ gradient_checkpointing: false
59
+ use_dummy_backbone: false
60
+ fusion:
61
+ hidden_dim: 512
62
+ num_cameras: 3
63
+ num_layers: 4
64
+ num_heads: 8
65
+ ff_dim: 2048
66
+ dropout: 0.1
67
+ proprio_dim: 32
68
+ proprio_tokens: 1
69
+ memory:
70
+ hidden_dim: 512
71
+ action_dim: 14
72
+ history_steps: 6
73
+ scene_history_steps: 3
74
+ belief_history_steps: 8
75
+ num_layers: 2
76
+ dropout: 0.1
77
+ memory_bank_size: 4
78
+ scene_bank_size: 2
79
+ belief_bank_size: 2
80
+ num_heads: 8
81
+ max_history_steps: 8
82
+ decoder:
83
+ hidden_dim: 512
84
+ num_heads: 8
85
+ num_layers: 4
86
+ ff_dim: 2048
87
+ dropout: 0.1
88
+ chunk_size: 8
89
+ action_dim: 14
90
+ arm_action_dim: 7
91
+ num_candidates: 8
92
+ num_phases: 5
93
+ num_arm_roles: 4
94
+ num_proposal_modes: 7
95
+ planner_top_k: 4
96
+ proposal_delta_scale: 0.2
97
+ proposal_slot_scale: 0.05
98
+ reveal_head:
99
+ hidden_dim: 512
100
+ num_support_modes: 3
101
+ num_approach_templates: 32
102
+ rollout_horizon: 5
103
+ belief_map_size: 32
104
+ field_size: 16
105
+ num_heads: 8
106
+ predict_belief_map: true
107
+ num_phases: 5
108
+ num_arm_roles: 4
109
+ num_interaction_tokens: 8
110
+ num_tasks: 4
111
+ world_model:
112
+ hidden_dim: 512
113
+ action_dim: 14
114
+ num_support_modes: 3
115
+ num_approach_templates: 32
116
+ rollout_horizon: 5
117
+ field_size: 16
118
+ num_heads: 8
119
+ num_phases: 5
120
+ num_arm_roles: 4
121
+ num_interaction_tokens: 8
122
+ belief_map_size: 32
123
+ predict_belief_map: true
124
+ scene_bank_size: 2
125
+ belief_bank_size: 2
126
+ rollout_mode: compact_rollout
127
+ num_tasks: 4
128
+ lightweight_field_size: 4
129
+ planner:
130
+ hidden_dim: 512
131
+ num_candidates: 8
132
+ action_dim: 14
133
+ num_support_modes: 3
134
+ utility_margin: 0.1
135
+ num_heads: 8
136
+ num_layers: 2
137
+ num_phases: 5
138
+ num_arm_roles: 4
139
+ top_k: 4
140
+ adapter_confidence_threshold: 0.55
141
+ loss_weights:
142
+ action: 0.5
143
+ phase: 0.0
144
+ arm_role: 0.0
145
+ support_mode: 0.0
146
+ corridor: 0.0
147
+ persistence: 0.0
148
+ disturbance: 0.0
149
+ world_model: 0.0
150
+ transition: 0.0
151
+ belief: 0.0
152
+ visibility: 0.0
153
+ clearance: 0.0
154
+ support_stability: 0.0
155
+ reocclusion: 0.0
156
+ occluder_contact: 0.0
157
+ grasp_affordance: 0.0
158
+ planner_success: 0.0
159
+ planner_risk: 0.0
160
+ planner_ranking: 0.2
161
+ proposal_reconstruction: 0.0
162
+ proposal_success: 0.1
163
+ proposal_ranking: 0.2
164
+ proposal_mode: 0.1
165
+ proposal_diversity: 0.02
166
+ role_swap_consistency: 0.0
167
+ task_metrics: 0.0
168
+ gate: 0.0
169
+ distillation: 0.05
170
+ calibration: 0.0
artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_rank_only_rebuild128_seed17/metrics.json ADDED
@@ -0,0 +1,278 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "epoch": 0,
4
+ "train": {
5
+ "action": 1.1852011870937187,
6
+ "arm_role": 0.002373194619387138,
7
+ "belief": 0.10289109610960263,
8
+ "calibration": 0.0,
9
+ "clearance": 0.08050862655920141,
10
+ "corridor": 0.21972917464851333,
11
+ "distillation": 0.00017329733114407843,
12
+ "disturbance": 0.0017395270088327531,
13
+ "gate": 0.0,
14
+ "grasp_affordance": 0.011768270616552659,
15
+ "occluder_contact": 0.19525797589987265,
16
+ "persistence": 0.9892086396072092,
17
+ "phase": 0.18924372737147227,
18
+ "planner_ranking": 0.8172678849777254,
19
+ "planner_risk": 0.05744413993939632,
20
+ "planner_success": 0.7064468672796458,
21
+ "proposal_diversity": 0.0,
22
+ "proposal_mode": 0.9432200854565916,
23
+ "proposal_ranking": 1.336866896693446,
24
+ "proposal_reconstruction": 1.1112627968066882,
25
+ "proposal_success": 0.4027111357500573,
26
+ "reocclusion": 0.24888639283530853,
27
+ "role_swap_consistency": 0.0,
28
+ "support_mode": 0.00371579679723109,
29
+ "support_stability": 0.13197413343591852,
30
+ "task_metrics": 0.08024843531746824,
31
+ "total": 1.1580296083658683,
32
+ "transition": 0.0,
33
+ "uncertainty": 2.5850595745526205e-05,
34
+ "visibility": 0.09642420045467985,
35
+ "world_model": 0.0
36
+ },
37
+ "val": {
38
+ "action": 1.1680383563041687,
39
+ "arm_role": 0.0023813226105024415,
40
+ "belief": 0.10179599796732267,
41
+ "calibration": 0.0,
42
+ "clearance": 0.07945799206693967,
43
+ "corridor": 0.2141698474685351,
44
+ "distillation": 0.0,
45
+ "disturbance": 0.0019217911574135845,
46
+ "gate": 0.0,
47
+ "grasp_affordance": 0.011626164180537064,
48
+ "occluder_contact": 0.19411553194125494,
49
+ "persistence": 0.8884257813294728,
50
+ "phase": 0.1341669425445919,
51
+ "planner_ranking": 0.27815661728382113,
52
+ "planner_risk": 0.09556023739278316,
53
+ "planner_success": 0.4189198156197866,
54
+ "proposal_diversity": 0.0,
55
+ "proposal_mode": 0.8612981418768565,
56
+ "proposal_ranking": 1.3541318853696187,
57
+ "proposal_reconstruction": 1.0655952940384548,
58
+ "proposal_success": 0.34761282006899513,
59
+ "reocclusion": 0.23091794028878213,
60
+ "role_swap_consistency": 0.0,
61
+ "support_mode": 0.0032604910316877066,
62
+ "support_stability": 0.1301962616542975,
63
+ "task_metrics": 0.06759862539668877,
64
+ "total": 1.0313682556152344,
65
+ "transition": 0.0,
66
+ "uncertainty": 2.724018773581823e-05,
67
+ "visibility": 0.09551568776369095,
68
+ "world_model": 0.0
69
+ }
70
+ },
71
+ {
72
+ "epoch": 1,
73
+ "train": {
74
+ "action": 1.1870849994050354,
75
+ "arm_role": 0.0023704882018549854,
76
+ "belief": 0.10286598608774297,
77
+ "calibration": 0.0,
78
+ "clearance": 0.08047503743226789,
79
+ "corridor": 0.21940489163418778,
80
+ "distillation": 0.0,
81
+ "disturbance": 0.0017350247245234978,
82
+ "gate": 0.0,
83
+ "grasp_affordance": 0.011760568257984744,
84
+ "occluder_contact": 0.19528898884769247,
85
+ "persistence": 0.9879098851625033,
86
+ "phase": 0.18875574952914936,
87
+ "planner_ranking": 0.08558745583628907,
88
+ "planner_risk": 0.1399850454651007,
89
+ "planner_success": 0.3386907313300782,
90
+ "proposal_diversity": 0.0,
91
+ "proposal_mode": 0.8769457270117367,
92
+ "proposal_ranking": 1.3593093036603527,
93
+ "proposal_reconstruction": 1.1160700326206303,
94
+ "proposal_success": 0.36580811954346026,
95
+ "reocclusion": 0.2486385852098465,
96
+ "role_swap_consistency": 0.0,
97
+ "support_mode": 0.003717367388621098,
98
+ "support_stability": 0.13195458464637524,
99
+ "task_metrics": 0.08063916548961352,
100
+ "total": 1.0067975045252247,
101
+ "transition": 0.0,
102
+ "uncertainty": 2.580285645843319e-05,
103
+ "visibility": 0.09639987023938604,
104
+ "world_model": 0.0
105
+ },
106
+ "val": {
107
+ "action": 1.1680383563041687,
108
+ "arm_role": 0.0023813226105024415,
109
+ "belief": 0.10179599796732267,
110
+ "calibration": 0.0,
111
+ "clearance": 0.07945799206693967,
112
+ "corridor": 0.2141698474685351,
113
+ "distillation": 0.0,
114
+ "disturbance": 0.0019217911574135845,
115
+ "gate": 0.0,
116
+ "grasp_affordance": 0.011626164180537064,
117
+ "occluder_contact": 0.19411553194125494,
118
+ "persistence": 0.8884257813294728,
119
+ "phase": 0.1341669425445919,
120
+ "planner_ranking": 0.020432091876864435,
121
+ "planner_risk": 0.16417022446791332,
122
+ "planner_success": 0.20522922178109487,
123
+ "proposal_diversity": 0.0,
124
+ "proposal_mode": 0.8313319782416025,
125
+ "proposal_ranking": 1.355160697301229,
126
+ "proposal_reconstruction": 1.065085584918658,
127
+ "proposal_success": 0.37029117544492085,
128
+ "reocclusion": 0.23091794028878213,
129
+ "role_swap_consistency": 0.0,
130
+ "support_mode": 0.0032604910316877066,
131
+ "support_stability": 0.1301962616542975,
132
+ "task_metrics": 0.06759862539668877,
133
+ "total": 0.979300343990326,
134
+ "transition": 0.0,
135
+ "uncertainty": 2.724018773581823e-05,
136
+ "visibility": 0.09551568776369095,
137
+ "world_model": 0.0
138
+ }
139
+ },
140
+ {
141
+ "epoch": 2,
142
+ "train": {
143
+ "action": 1.1864268629490828,
144
+ "arm_role": 0.002376417737031559,
145
+ "belief": 0.10281138397565409,
146
+ "calibration": 0.0,
147
+ "clearance": 0.08041088464630752,
148
+ "corridor": 0.21921880461839066,
149
+ "distillation": 0.0,
150
+ "disturbance": 0.0017383864957510548,
151
+ "gate": 0.0,
152
+ "grasp_affordance": 0.011750116954095849,
153
+ "occluder_contact": 0.19525049964920813,
154
+ "persistence": 0.9866341657686133,
155
+ "phase": 0.18828046964598866,
156
+ "planner_ranking": 0.01506587937317726,
157
+ "planner_risk": 0.17819794167240127,
158
+ "planner_success": 0.27137053726601,
159
+ "proposal_diversity": 0.0,
160
+ "proposal_mode": 0.871496272187273,
161
+ "proposal_ranking": 1.3522406766394608,
162
+ "proposal_reconstruction": 1.114444579396929,
163
+ "proposal_success": 0.36960093138598593,
164
+ "reocclusion": 0.24837740529485108,
165
+ "role_swap_consistency": 0.0,
166
+ "support_mode": 0.003723324929997951,
167
+ "support_stability": 0.1318679920890752,
168
+ "task_metrics": 0.08124440529641985,
169
+ "total": 0.9907847267239034,
170
+ "transition": 0.0,
171
+ "uncertainty": 2.5764442244401245e-05,
172
+ "visibility": 0.09634732048050697,
173
+ "world_model": 0.0
174
+ },
175
+ "val": {
176
+ "action": 1.1680383563041687,
177
+ "arm_role": 0.0023813226105024415,
178
+ "belief": 0.10179599796732267,
179
+ "calibration": 0.0,
180
+ "clearance": 0.07945799206693967,
181
+ "corridor": 0.2141698474685351,
182
+ "distillation": 0.0,
183
+ "disturbance": 0.0019217911574135845,
184
+ "gate": 0.0,
185
+ "grasp_affordance": 0.011626164180537064,
186
+ "occluder_contact": 0.19411553194125494,
187
+ "persistence": 0.8884257813294728,
188
+ "phase": 0.1341669425445919,
189
+ "planner_ranking": 0.008497202799965938,
190
+ "planner_risk": 0.1943199912707011,
191
+ "planner_success": 0.16650028626124064,
192
+ "proposal_diversity": 0.0,
193
+ "proposal_mode": 0.784325490395228,
194
+ "proposal_ranking": 1.3774529258410135,
195
+ "proposal_reconstruction": 1.0638848970333734,
196
+ "proposal_success": 0.3639564683039983,
197
+ "reocclusion": 0.23091794028878213,
198
+ "role_swap_consistency": 0.0,
199
+ "support_mode": 0.0032604910316877066,
200
+ "support_stability": 0.1301962616542975,
201
+ "task_metrics": 0.06759862539668877,
202
+ "total": 0.9760376830895742,
203
+ "transition": 0.0,
204
+ "uncertainty": 2.724018773581823e-05,
205
+ "visibility": 0.09551568776369095,
206
+ "world_model": 0.0
207
+ }
208
+ },
209
+ {
210
+ "epoch": 3,
211
+ "train": {
212
+ "action": 1.1860493772170122,
213
+ "arm_role": 0.002373194953958903,
214
+ "belief": 0.10285232979960802,
215
+ "calibration": 0.0,
216
+ "clearance": 0.08046898640253965,
217
+ "corridor": 0.21937422429313178,
218
+ "distillation": 0.0,
219
+ "disturbance": 0.001741885332568713,
220
+ "gate": 0.0,
221
+ "grasp_affordance": 0.011761440472880831,
222
+ "occluder_contact": 0.19526721023711838,
223
+ "persistence": 0.9867200422562471,
224
+ "phase": 0.18844207436895044,
225
+ "planner_ranking": 0.008475025738159022,
226
+ "planner_risk": 0.20258555417301274,
227
+ "planner_success": 0.24018349805298975,
228
+ "proposal_diversity": 0.0,
229
+ "proposal_mode": 0.8707189029004393,
230
+ "proposal_ranking": 1.3544838268215917,
231
+ "proposal_reconstruction": 1.113739546857962,
232
+ "proposal_success": 0.36756599099696186,
233
+ "reocclusion": 0.24844886725690185,
234
+ "role_swap_consistency": 0.0,
235
+ "support_mode": 0.0037099133850224003,
236
+ "support_stability": 0.13174430547016008,
237
+ "task_metrics": 0.0815500450328368,
238
+ "total": 0.9894452145119675,
239
+ "transition": 0.0,
240
+ "uncertainty": 2.5792860569021012e-05,
241
+ "visibility": 0.09639218781425171,
242
+ "world_model": 0.0
243
+ },
244
+ "val": {
245
+ "action": 1.1680383563041687,
246
+ "arm_role": 0.0023813226105024415,
247
+ "belief": 0.10179599796732267,
248
+ "calibration": 0.0,
249
+ "clearance": 0.07945799206693967,
250
+ "corridor": 0.2141698474685351,
251
+ "distillation": 0.0,
252
+ "disturbance": 0.0019217911574135845,
253
+ "gate": 0.0,
254
+ "grasp_affordance": 0.011626164180537064,
255
+ "occluder_contact": 0.19411553194125494,
256
+ "persistence": 0.8884257813294728,
257
+ "phase": 0.1341669425445919,
258
+ "planner_ranking": 0.006291244722281893,
259
+ "planner_risk": 0.22365033129851022,
260
+ "planner_success": 0.1353773462275664,
261
+ "proposal_diversity": 0.0,
262
+ "proposal_mode": 0.8833410640557607,
263
+ "proposal_ranking": 1.3212236324946085,
264
+ "proposal_reconstruction": 1.0634535759687425,
265
+ "proposal_success": 0.36492464542388914,
266
+ "reocclusion": 0.23091794028878213,
267
+ "role_swap_consistency": 0.0,
268
+ "support_mode": 0.0032604910316877066,
269
+ "support_stability": 0.1301962616542975,
270
+ "task_metrics": 0.06759862539668877,
271
+ "total": 0.9743489940961202,
272
+ "transition": 0.0,
273
+ "uncertainty": 2.724018773581823e-05,
274
+ "visibility": 0.09551568776369095,
275
+ "world_model": 0.0
276
+ }
277
+ }
278
+ ]
artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_rank_only_rebuild128_seed17/summary.json ADDED
The diff for this file is too large to render. See raw diff
 
artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_transition_fast_seed17/config_resolved.yaml ADDED
@@ -0,0 +1,174 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ experiment_name: proxy_adapter_wrapped_clip_transition_fast_seed17
2
+ output_dir: /workspace/workspace/outputs/adapter_proxy
3
+ device: cuda
4
+ seed: 17
5
+ init_checkpoint: /workspace/workspace/VLAarchtests2/VLAarchtests/artifacts/outputs/r3d_handoff_phase/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_seed17/checkpoint_best.pt
6
+ init_strict: false
7
+ data:
8
+ proxies:
9
+ - foliage_proxy
10
+ - bag_proxy
11
+ - cloth_proxy
12
+ resolution: 224
13
+ dataset_version: reveal_proxy_v6_rgbd_elastic_state_phase_fast_transition
14
+ train_episodes_per_proxy: 12
15
+ val_episodes_per_proxy: 4
16
+ train_dataset_path: /workspace/workspace/data/reveal_proxy/proxy_train_clip224_v6_rgbd_stage3_phase_fast_transition.pt
17
+ val_dataset_path: /workspace/workspace/data/reveal_proxy/proxy_val_clip224_v6_rgbd_stage3_phase_fast_transition.pt
18
+ rebuild_dataset: false
19
+ chunk_horizon: 8
20
+ rollout_horizon: 5
21
+ history_steps: 6
22
+ planner_candidates: 8
23
+ seed: 17
24
+ optim:
25
+ epochs: 2
26
+ batch_size: 4
27
+ num_workers: 8
28
+ lr: 0.0001
29
+ weight_decay: 0.0001
30
+ trainer:
31
+ policy_type: adapter_wrapped
32
+ training_regime: adapter_train_frozen_trunk
33
+ eval_mode: adapter_active
34
+ adapter_mode: adapter_active
35
+ adapter_use_transition_model: true
36
+ adapter_use_task_conditioning: true
37
+ use_bf16: true
38
+ grad_clip_norm: 1.0
39
+ freeze_backbone: true
40
+ gradient_checkpointing: false
41
+ plan_during_train: false
42
+ plan_during_eval: false
43
+ support_mode_conditioning: true
44
+ planner_mode: false
45
+ use_depth: true
46
+ use_world_model: false
47
+ use_role_tokens: true
48
+ compute_equivariance_probe: false
49
+ trainable_parameter_prefixes:
50
+ - adapter.state_head
51
+ - adapter.proposal_prior
52
+ - adapter.transition_model
53
+ - adapter.planner
54
+ policy:
55
+ backbone:
56
+ model_name: openai/clip-vit-base-patch32
57
+ hidden_dim: 512
58
+ max_text_tokens: 32
59
+ freeze_backbone: true
60
+ gradient_checkpointing: false
61
+ use_dummy_backbone: false
62
+ fusion:
63
+ hidden_dim: 512
64
+ num_cameras: 3
65
+ num_layers: 4
66
+ num_heads: 8
67
+ ff_dim: 2048
68
+ dropout: 0.1
69
+ proprio_dim: 32
70
+ proprio_tokens: 1
71
+ memory:
72
+ hidden_dim: 512
73
+ action_dim: 14
74
+ history_steps: 6
75
+ scene_history_steps: 3
76
+ belief_history_steps: 8
77
+ num_layers: 2
78
+ dropout: 0.1
79
+ memory_bank_size: 4
80
+ scene_bank_size: 2
81
+ belief_bank_size: 2
82
+ num_heads: 8
83
+ max_history_steps: 8
84
+ reveal_cache_steps: 4
85
+ reveal_cache_decay: 0.7
86
+ decoder:
87
+ hidden_dim: 512
88
+ num_heads: 8
89
+ num_layers: 4
90
+ ff_dim: 2048
91
+ dropout: 0.1
92
+ chunk_size: 8
93
+ action_dim: 14
94
+ arm_action_dim: 7
95
+ num_candidates: 8
96
+ num_phases: 5
97
+ num_arm_roles: 4
98
+ num_proposal_modes: 7
99
+ planner_top_k: 4
100
+ proposal_delta_scale: 0.2
101
+ proposal_slot_scale: 0.05
102
+ reveal_head:
103
+ hidden_dim: 512
104
+ num_support_modes: 3
105
+ num_approach_templates: 32
106
+ rollout_horizon: 5
107
+ belief_map_size: 32
108
+ field_size: 16
109
+ num_heads: 8
110
+ predict_belief_map: true
111
+ num_phases: 5
112
+ num_arm_roles: 4
113
+ num_interaction_tokens: 8
114
+ num_tasks: 4
115
+ world_model:
116
+ hidden_dim: 512
117
+ action_dim: 14
118
+ num_support_modes: 3
119
+ num_approach_templates: 32
120
+ rollout_horizon: 5
121
+ field_size: 16
122
+ num_heads: 8
123
+ num_phases: 5
124
+ num_arm_roles: 4
125
+ num_interaction_tokens: 8
126
+ belief_map_size: 32
127
+ predict_belief_map: true
128
+ scene_bank_size: 2
129
+ belief_bank_size: 2
130
+ rollout_mode: compact_rollout
131
+ num_tasks: 4
132
+ lightweight_field_size: 4
133
+ planner:
134
+ hidden_dim: 512
135
+ num_candidates: 8
136
+ action_dim: 14
137
+ num_support_modes: 3
138
+ utility_margin: 0.1
139
+ num_heads: 8
140
+ num_layers: 2
141
+ num_phases: 5
142
+ num_arm_roles: 4
143
+ top_k: 4
144
+ adapter_confidence_threshold: 0.45
145
+ loss_weights:
146
+ action: 1.0
147
+ phase: 0.08
148
+ arm_role: 0.08
149
+ support_mode: 0.08
150
+ corridor: 0.12
151
+ persistence: 0.06
152
+ disturbance: 0.06
153
+ world_model: 0.0
154
+ transition: 0.15
155
+ belief: 0.05
156
+ visibility: 0.05
157
+ clearance: 0.06
158
+ support_stability: 0.06
159
+ reocclusion: 0.06
160
+ occluder_contact: 0.05
161
+ grasp_affordance: 0.05
162
+ planner_success: 0.15
163
+ planner_risk: 0.08
164
+ planner_ranking: 0.15
165
+ proposal_reconstruction: 0.08
166
+ proposal_success: 0.1
167
+ proposal_ranking: 0.12
168
+ proposal_mode: 0.08
169
+ proposal_diversity: 0.05
170
+ role_swap_consistency: 0.0
171
+ task_metrics: 0.06
172
+ gate: 0.05
173
+ distillation: 0.05
174
+ calibration: 0.02
artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_transition_fast_seed17/metrics.json ADDED
@@ -0,0 +1,140 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "epoch": 0,
4
+ "train": {
5
+ "action": 1.2014537013095359,
6
+ "arm_role": 0.004142667506011608,
7
+ "belief": 0.10642868701530539,
8
+ "calibration": 0.0,
9
+ "clearance": 0.08262280795885169,
10
+ "corridor": 0.22370571363717318,
11
+ "distillation": 0.0018765011868115676,
12
+ "disturbance": 0.0011591566448180895,
13
+ "gate": 0.0,
14
+ "grasp_affordance": 0.012573797620185043,
15
+ "occluder_contact": 0.1948690563440323,
16
+ "persistence": 0.5442525049894238,
17
+ "phase": 0.14094198657118756,
18
+ "planner_ranking": 1.1814680177232493,
19
+ "planner_risk": 0.03286057249035524,
20
+ "planner_success": 0.49930323725161346,
21
+ "proposal_diversity": 0.0,
22
+ "proposal_mode": 0.9191397609918014,
23
+ "proposal_ranking": 0.7756888011227483,
24
+ "proposal_reconstruction": 1.1855679594952127,
25
+ "proposal_success": 0.5070859141971754,
26
+ "reocclusion": 0.2707118239739667,
27
+ "role_swap_consistency": 0.0,
28
+ "support_mode": 0.81035483142604,
29
+ "support_stability": 0.135533408464297,
30
+ "task_metrics": 0.07755828170996645,
31
+ "total": 2.5070675456005596,
32
+ "transition": 3.653185836646868,
33
+ "uncertainty": 5.752725617284064e-05,
34
+ "visibility": 0.0989064211430757,
35
+ "world_model": 0.0
36
+ },
37
+ "val": {
38
+ "action": 1.1732755303382874,
39
+ "arm_role": 0.001568492029036861,
40
+ "belief": 0.09933605697005987,
41
+ "calibration": 0.0,
42
+ "clearance": 0.07699812250211835,
43
+ "corridor": 0.1967080980539322,
44
+ "distillation": 0.002813707455061376,
45
+ "disturbance": 0.0013425838133116486,
46
+ "gate": 0.0,
47
+ "grasp_affordance": 0.010458780219778419,
48
+ "occluder_contact": 0.19887321814894676,
49
+ "persistence": 0.3571807991247624,
50
+ "phase": 0.23128701612586156,
51
+ "planner_ranking": 0.9876129180192947,
52
+ "planner_risk": 0.032078082440420985,
53
+ "planner_success": 0.3786630928516388,
54
+ "proposal_diversity": 0.0,
55
+ "proposal_mode": 0.5780632123351097,
56
+ "proposal_ranking": 0.6625044224783778,
57
+ "proposal_reconstruction": 1.1224287077784538,
58
+ "proposal_success": 0.32306262850761414,
59
+ "reocclusion": 0.21124972961843014,
60
+ "role_swap_consistency": 0.0,
61
+ "support_mode": 0.6393884606659412,
62
+ "support_stability": 0.13537815306335688,
63
+ "task_metrics": 0.06436877744272351,
64
+ "total": 2.0122548937797546,
65
+ "transition": 1.4614488631486893,
66
+ "uncertainty": 4.029161317120611e-05,
67
+ "visibility": 0.09316261485219002,
68
+ "world_model": 0.0
69
+ }
70
+ },
71
+ {
72
+ "epoch": 1,
73
+ "train": {
74
+ "action": 1.2011131566503774,
75
+ "arm_role": 0.0004577429398246433,
76
+ "belief": 0.10254939839891765,
77
+ "calibration": 0.0,
78
+ "clearance": 0.08239235486025395,
79
+ "corridor": 0.209725521646602,
80
+ "distillation": 0.0029014512876291637,
81
+ "disturbance": 0.001299830724272634,
82
+ "gate": 0.0,
83
+ "grasp_affordance": 0.011238907848525307,
84
+ "occluder_contact": 0.19421758470327957,
85
+ "persistence": 0.2043300135941852,
86
+ "phase": 0.16561541823751252,
87
+ "planner_ranking": 0.9580214386400969,
88
+ "planner_risk": 0.03229632252908271,
89
+ "planner_success": 0.36985718167346454,
90
+ "proposal_diversity": 0.0,
91
+ "proposal_mode": 0.5392822428889896,
92
+ "proposal_ranking": 0.7421491457068402,
93
+ "proposal_reconstruction": 1.15594565997953,
94
+ "proposal_success": 0.27282858737137006,
95
+ "reocclusion": 0.13705282172431116,
96
+ "role_swap_consistency": 0.0,
97
+ "support_mode": 0.6220477378886679,
98
+ "support_stability": 0.13319886832133584,
99
+ "task_metrics": 0.07506552370993988,
100
+ "total": 2.026416410570559,
101
+ "transition": 1.500129469062971,
102
+ "uncertainty": 1.367867451214668e-05,
103
+ "visibility": 0.09593491644962975,
104
+ "world_model": 0.0
105
+ },
106
+ "val": {
107
+ "action": 1.2080544531345367,
108
+ "arm_role": 0.0001097214071705821,
109
+ "belief": 0.09881071373820305,
110
+ "calibration": 0.0,
111
+ "clearance": 0.07554284203797579,
112
+ "corridor": 0.19048454985022545,
113
+ "distillation": 0.0,
114
+ "disturbance": 0.00071150396252051,
115
+ "gate": 0.0,
116
+ "grasp_affordance": 0.009015273419208825,
117
+ "occluder_contact": 0.1911622602492571,
118
+ "persistence": 0.4154473473317921,
119
+ "phase": 0.22401500784326345,
120
+ "planner_ranking": 0.9130920022726059,
121
+ "planner_risk": 0.03172952332533896,
122
+ "planner_success": 0.36061106994748116,
123
+ "proposal_diversity": 0.0,
124
+ "proposal_mode": 0.46144857816398144,
125
+ "proposal_ranking": 0.6975354589521885,
126
+ "proposal_reconstruction": 1.0902796238660812,
127
+ "proposal_success": 0.2553649302572012,
128
+ "reocclusion": 0.18199651315808296,
129
+ "role_swap_consistency": 0.0,
130
+ "support_mode": 0.7191376462578773,
131
+ "support_stability": 0.13278500083833933,
132
+ "task_metrics": 0.06281590019352734,
133
+ "total": 1.9747500270605087,
134
+ "transition": 1.1311135664582253,
135
+ "uncertainty": 7.986968377338144e-06,
136
+ "visibility": 0.09258495084941387,
137
+ "world_model": 0.0
138
+ }
139
+ }
140
+ ]
artifacts/outputs/adapter_proxy/proxy_adapter_wrapped_clip_transition_fast_seed17/summary.json ADDED
The diff for this file is too large to render. See raw diff
 
artifacts/reports/anchor_dual_push_smoke_ep1/original_trunk/rollout_eval.json ADDED
@@ -0,0 +1,280 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "checkpoint": "/workspace/workspace/VLAarchtests2/outputs/rlbench_dual_push/rlbench_dual_push_backbone_only_clip_chunk8_weighted_seed17/checkpoint_best.pt",
3
+ "plan_requested": false,
4
+ "plan_applied": false,
5
+ "planner_mode": "trainable",
6
+ "support_mode_conditioning": true,
7
+ "task_conditioning": true,
8
+ "geometry_enabled": true,
9
+ "world_model_mode": "checkpoint_default",
10
+ "episodes_per_task": 1,
11
+ "episode_length": 25,
12
+ "resolution": 256,
13
+ "reset_retries": 20,
14
+ "arm_mode": "planning",
15
+ "delta_scale": 1.0,
16
+ "cameras": [
17
+ "front",
18
+ "wrist_left",
19
+ "wrist_right"
20
+ ],
21
+ "tasks": {
22
+ "bimanual_dual_push_buttons": {
23
+ "task_class": "BimanualDualPushButtons",
24
+ "successes": [
25
+ 0.0
26
+ ],
27
+ "returns": [
28
+ 0.0
29
+ ],
30
+ "path_recoveries": [
31
+ 0
32
+ ],
33
+ "noop_fallbacks": [
34
+ 0
35
+ ],
36
+ "reset_retries": [
37
+ 0
38
+ ],
39
+ "episode_traces": [
40
+ {
41
+ "language_goal": "push the olive and the orange buttons",
42
+ "steps": [
43
+ {
44
+ "timestep": 0,
45
+ "chosen_macro_mode": null,
46
+ "planner_scores": null,
47
+ "predicted_reocclusion": null,
48
+ "support_mode_conditioning": true,
49
+ "path_recoveries": 0,
50
+ "noop_fallbacks": 0
51
+ },
52
+ {
53
+ "timestep": 1,
54
+ "chosen_macro_mode": null,
55
+ "planner_scores": null,
56
+ "predicted_reocclusion": null,
57
+ "support_mode_conditioning": true,
58
+ "path_recoveries": 0,
59
+ "noop_fallbacks": 0
60
+ },
61
+ {
62
+ "timestep": 2,
63
+ "chosen_macro_mode": null,
64
+ "planner_scores": null,
65
+ "predicted_reocclusion": null,
66
+ "support_mode_conditioning": true,
67
+ "path_recoveries": 0,
68
+ "noop_fallbacks": 0
69
+ },
70
+ {
71
+ "timestep": 3,
72
+ "chosen_macro_mode": null,
73
+ "planner_scores": null,
74
+ "predicted_reocclusion": null,
75
+ "support_mode_conditioning": true,
76
+ "path_recoveries": 0,
77
+ "noop_fallbacks": 0
78
+ },
79
+ {
80
+ "timestep": 4,
81
+ "chosen_macro_mode": null,
82
+ "planner_scores": null,
83
+ "predicted_reocclusion": null,
84
+ "support_mode_conditioning": true,
85
+ "path_recoveries": 0,
86
+ "noop_fallbacks": 0
87
+ },
88
+ {
89
+ "timestep": 5,
90
+ "chosen_macro_mode": null,
91
+ "planner_scores": null,
92
+ "predicted_reocclusion": null,
93
+ "support_mode_conditioning": true,
94
+ "path_recoveries": 0,
95
+ "noop_fallbacks": 0
96
+ },
97
+ {
98
+ "timestep": 6,
99
+ "chosen_macro_mode": null,
100
+ "planner_scores": null,
101
+ "predicted_reocclusion": null,
102
+ "support_mode_conditioning": true,
103
+ "path_recoveries": 0,
104
+ "noop_fallbacks": 0
105
+ },
106
+ {
107
+ "timestep": 7,
108
+ "chosen_macro_mode": null,
109
+ "planner_scores": null,
110
+ "predicted_reocclusion": null,
111
+ "support_mode_conditioning": true,
112
+ "path_recoveries": 0,
113
+ "noop_fallbacks": 0
114
+ },
115
+ {
116
+ "timestep": 8,
117
+ "chosen_macro_mode": null,
118
+ "planner_scores": null,
119
+ "predicted_reocclusion": null,
120
+ "support_mode_conditioning": true,
121
+ "path_recoveries": 0,
122
+ "noop_fallbacks": 0
123
+ },
124
+ {
125
+ "timestep": 9,
126
+ "chosen_macro_mode": null,
127
+ "planner_scores": null,
128
+ "predicted_reocclusion": null,
129
+ "support_mode_conditioning": true,
130
+ "path_recoveries": 0,
131
+ "noop_fallbacks": 0
132
+ },
133
+ {
134
+ "timestep": 10,
135
+ "chosen_macro_mode": null,
136
+ "planner_scores": null,
137
+ "predicted_reocclusion": null,
138
+ "support_mode_conditioning": true,
139
+ "path_recoveries": 0,
140
+ "noop_fallbacks": 0
141
+ },
142
+ {
143
+ "timestep": 11,
144
+ "chosen_macro_mode": null,
145
+ "planner_scores": null,
146
+ "predicted_reocclusion": null,
147
+ "support_mode_conditioning": true,
148
+ "path_recoveries": 0,
149
+ "noop_fallbacks": 0
150
+ },
151
+ {
152
+ "timestep": 12,
153
+ "chosen_macro_mode": null,
154
+ "planner_scores": null,
155
+ "predicted_reocclusion": null,
156
+ "support_mode_conditioning": true,
157
+ "path_recoveries": 0,
158
+ "noop_fallbacks": 0
159
+ },
160
+ {
161
+ "timestep": 13,
162
+ "chosen_macro_mode": null,
163
+ "planner_scores": null,
164
+ "predicted_reocclusion": null,
165
+ "support_mode_conditioning": true,
166
+ "path_recoveries": 0,
167
+ "noop_fallbacks": 0
168
+ },
169
+ {
170
+ "timestep": 14,
171
+ "chosen_macro_mode": null,
172
+ "planner_scores": null,
173
+ "predicted_reocclusion": null,
174
+ "support_mode_conditioning": true,
175
+ "path_recoveries": 0,
176
+ "noop_fallbacks": 0
177
+ },
178
+ {
179
+ "timestep": 15,
180
+ "chosen_macro_mode": null,
181
+ "planner_scores": null,
182
+ "predicted_reocclusion": null,
183
+ "support_mode_conditioning": true,
184
+ "path_recoveries": 0,
185
+ "noop_fallbacks": 0
186
+ },
187
+ {
188
+ "timestep": 16,
189
+ "chosen_macro_mode": null,
190
+ "planner_scores": null,
191
+ "predicted_reocclusion": null,
192
+ "support_mode_conditioning": true,
193
+ "path_recoveries": 0,
194
+ "noop_fallbacks": 0
195
+ },
196
+ {
197
+ "timestep": 17,
198
+ "chosen_macro_mode": null,
199
+ "planner_scores": null,
200
+ "predicted_reocclusion": null,
201
+ "support_mode_conditioning": true,
202
+ "path_recoveries": 0,
203
+ "noop_fallbacks": 0
204
+ },
205
+ {
206
+ "timestep": 18,
207
+ "chosen_macro_mode": null,
208
+ "planner_scores": null,
209
+ "predicted_reocclusion": null,
210
+ "support_mode_conditioning": true,
211
+ "path_recoveries": 0,
212
+ "noop_fallbacks": 0
213
+ },
214
+ {
215
+ "timestep": 19,
216
+ "chosen_macro_mode": null,
217
+ "planner_scores": null,
218
+ "predicted_reocclusion": null,
219
+ "support_mode_conditioning": true,
220
+ "path_recoveries": 0,
221
+ "noop_fallbacks": 0
222
+ },
223
+ {
224
+ "timestep": 20,
225
+ "chosen_macro_mode": null,
226
+ "planner_scores": null,
227
+ "predicted_reocclusion": null,
228
+ "support_mode_conditioning": true,
229
+ "path_recoveries": 0,
230
+ "noop_fallbacks": 0
231
+ },
232
+ {
233
+ "timestep": 21,
234
+ "chosen_macro_mode": null,
235
+ "planner_scores": null,
236
+ "predicted_reocclusion": null,
237
+ "support_mode_conditioning": true,
238
+ "path_recoveries": 0,
239
+ "noop_fallbacks": 0
240
+ },
241
+ {
242
+ "timestep": 22,
243
+ "chosen_macro_mode": null,
244
+ "planner_scores": null,
245
+ "predicted_reocclusion": null,
246
+ "support_mode_conditioning": true,
247
+ "path_recoveries": 0,
248
+ "noop_fallbacks": 0
249
+ },
250
+ {
251
+ "timestep": 23,
252
+ "chosen_macro_mode": null,
253
+ "planner_scores": null,
254
+ "predicted_reocclusion": null,
255
+ "support_mode_conditioning": true,
256
+ "path_recoveries": 0,
257
+ "noop_fallbacks": 0
258
+ },
259
+ {
260
+ "timestep": 24,
261
+ "chosen_macro_mode": null,
262
+ "planner_scores": null,
263
+ "predicted_reocclusion": null,
264
+ "support_mode_conditioning": true,
265
+ "path_recoveries": 0,
266
+ "noop_fallbacks": 0
267
+ }
268
+ ],
269
+ "success": 0.0,
270
+ "return": 0.0,
271
+ "path_recoveries": 0,
272
+ "noop_fallbacks": 0
273
+ }
274
+ ],
275
+ "mean_success": 0.0,
276
+ "mean_return": 0.0
277
+ }
278
+ },
279
+ "mean_success": 0.0
280
+ }
artifacts/reports/anchor_dual_push_smoke_ep1/original_trunk/rollout_eval.md ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # RLBench Rollout Eval
2
+
3
+ - Checkpoint: `/workspace/workspace/VLAarchtests2/outputs/rlbench_dual_push/rlbench_dual_push_backbone_only_clip_chunk8_weighted_seed17/checkpoint_best.pt`
4
+ - Plan requested: `False`
5
+ - Plan applied: `False`
6
+ - Support-mode conditioning: `True`
7
+ - Task conditioning: `True`
8
+ - Geometry enabled: `True`
9
+ - World-model mode: `checkpoint_default`
10
+ - Mean success: `0.000`
11
+
12
+ ## Per-task
13
+
14
+ - `bimanual_dual_push_buttons`: mean_success=0.000, returns=[0.0]
artifacts/reports/anchor_dual_push_smoke_ep1/original_trunk/rollout_eval.partial.json ADDED
@@ -0,0 +1,280 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "checkpoint": "/workspace/workspace/VLAarchtests2/outputs/rlbench_dual_push/rlbench_dual_push_backbone_only_clip_chunk8_weighted_seed17/checkpoint_best.pt",
3
+ "plan_requested": false,
4
+ "plan_applied": false,
5
+ "planner_mode": "trainable",
6
+ "support_mode_conditioning": true,
7
+ "task_conditioning": true,
8
+ "geometry_enabled": true,
9
+ "world_model_mode": "checkpoint_default",
10
+ "episodes_per_task": 1,
11
+ "episode_length": 25,
12
+ "resolution": 256,
13
+ "reset_retries": 20,
14
+ "arm_mode": "planning",
15
+ "delta_scale": 1.0,
16
+ "cameras": [
17
+ "front",
18
+ "wrist_left",
19
+ "wrist_right"
20
+ ],
21
+ "tasks": {
22
+ "bimanual_dual_push_buttons": {
23
+ "task_class": "BimanualDualPushButtons",
24
+ "successes": [
25
+ 0.0
26
+ ],
27
+ "returns": [
28
+ 0.0
29
+ ],
30
+ "path_recoveries": [
31
+ 0
32
+ ],
33
+ "noop_fallbacks": [
34
+ 0
35
+ ],
36
+ "reset_retries": [
37
+ 0
38
+ ],
39
+ "episode_traces": [
40
+ {
41
+ "language_goal": "push the olive and the orange buttons",
42
+ "steps": [
43
+ {
44
+ "timestep": 0,
45
+ "chosen_macro_mode": null,
46
+ "planner_scores": null,
47
+ "predicted_reocclusion": null,
48
+ "support_mode_conditioning": true,
49
+ "path_recoveries": 0,
50
+ "noop_fallbacks": 0
51
+ },
52
+ {
53
+ "timestep": 1,
54
+ "chosen_macro_mode": null,
55
+ "planner_scores": null,
56
+ "predicted_reocclusion": null,
57
+ "support_mode_conditioning": true,
58
+ "path_recoveries": 0,
59
+ "noop_fallbacks": 0
60
+ },
61
+ {
62
+ "timestep": 2,
63
+ "chosen_macro_mode": null,
64
+ "planner_scores": null,
65
+ "predicted_reocclusion": null,
66
+ "support_mode_conditioning": true,
67
+ "path_recoveries": 0,
68
+ "noop_fallbacks": 0
69
+ },
70
+ {
71
+ "timestep": 3,
72
+ "chosen_macro_mode": null,
73
+ "planner_scores": null,
74
+ "predicted_reocclusion": null,
75
+ "support_mode_conditioning": true,
76
+ "path_recoveries": 0,
77
+ "noop_fallbacks": 0
78
+ },
79
+ {
80
+ "timestep": 4,
81
+ "chosen_macro_mode": null,
82
+ "planner_scores": null,
83
+ "predicted_reocclusion": null,
84
+ "support_mode_conditioning": true,
85
+ "path_recoveries": 0,
86
+ "noop_fallbacks": 0
87
+ },
88
+ {
89
+ "timestep": 5,
90
+ "chosen_macro_mode": null,
91
+ "planner_scores": null,
92
+ "predicted_reocclusion": null,
93
+ "support_mode_conditioning": true,
94
+ "path_recoveries": 0,
95
+ "noop_fallbacks": 0
96
+ },
97
+ {
98
+ "timestep": 6,
99
+ "chosen_macro_mode": null,
100
+ "planner_scores": null,
101
+ "predicted_reocclusion": null,
102
+ "support_mode_conditioning": true,
103
+ "path_recoveries": 0,
104
+ "noop_fallbacks": 0
105
+ },
106
+ {
107
+ "timestep": 7,
108
+ "chosen_macro_mode": null,
109
+ "planner_scores": null,
110
+ "predicted_reocclusion": null,
111
+ "support_mode_conditioning": true,
112
+ "path_recoveries": 0,
113
+ "noop_fallbacks": 0
114
+ },
115
+ {
116
+ "timestep": 8,
117
+ "chosen_macro_mode": null,
118
+ "planner_scores": null,
119
+ "predicted_reocclusion": null,
120
+ "support_mode_conditioning": true,
121
+ "path_recoveries": 0,
122
+ "noop_fallbacks": 0
123
+ },
124
+ {
125
+ "timestep": 9,
126
+ "chosen_macro_mode": null,
127
+ "planner_scores": null,
128
+ "predicted_reocclusion": null,
129
+ "support_mode_conditioning": true,
130
+ "path_recoveries": 0,
131
+ "noop_fallbacks": 0
132
+ },
133
+ {
134
+ "timestep": 10,
135
+ "chosen_macro_mode": null,
136
+ "planner_scores": null,
137
+ "predicted_reocclusion": null,
138
+ "support_mode_conditioning": true,
139
+ "path_recoveries": 0,
140
+ "noop_fallbacks": 0
141
+ },
142
+ {
143
+ "timestep": 11,
144
+ "chosen_macro_mode": null,
145
+ "planner_scores": null,
146
+ "predicted_reocclusion": null,
147
+ "support_mode_conditioning": true,
148
+ "path_recoveries": 0,
149
+ "noop_fallbacks": 0
150
+ },
151
+ {
152
+ "timestep": 12,
153
+ "chosen_macro_mode": null,
154
+ "planner_scores": null,
155
+ "predicted_reocclusion": null,
156
+ "support_mode_conditioning": true,
157
+ "path_recoveries": 0,
158
+ "noop_fallbacks": 0
159
+ },
160
+ {
161
+ "timestep": 13,
162
+ "chosen_macro_mode": null,
163
+ "planner_scores": null,
164
+ "predicted_reocclusion": null,
165
+ "support_mode_conditioning": true,
166
+ "path_recoveries": 0,
167
+ "noop_fallbacks": 0
168
+ },
169
+ {
170
+ "timestep": 14,
171
+ "chosen_macro_mode": null,
172
+ "planner_scores": null,
173
+ "predicted_reocclusion": null,
174
+ "support_mode_conditioning": true,
175
+ "path_recoveries": 0,
176
+ "noop_fallbacks": 0
177
+ },
178
+ {
179
+ "timestep": 15,
180
+ "chosen_macro_mode": null,
181
+ "planner_scores": null,
182
+ "predicted_reocclusion": null,
183
+ "support_mode_conditioning": true,
184
+ "path_recoveries": 0,
185
+ "noop_fallbacks": 0
186
+ },
187
+ {
188
+ "timestep": 16,
189
+ "chosen_macro_mode": null,
190
+ "planner_scores": null,
191
+ "predicted_reocclusion": null,
192
+ "support_mode_conditioning": true,
193
+ "path_recoveries": 0,
194
+ "noop_fallbacks": 0
195
+ },
196
+ {
197
+ "timestep": 17,
198
+ "chosen_macro_mode": null,
199
+ "planner_scores": null,
200
+ "predicted_reocclusion": null,
201
+ "support_mode_conditioning": true,
202
+ "path_recoveries": 0,
203
+ "noop_fallbacks": 0
204
+ },
205
+ {
206
+ "timestep": 18,
207
+ "chosen_macro_mode": null,
208
+ "planner_scores": null,
209
+ "predicted_reocclusion": null,
210
+ "support_mode_conditioning": true,
211
+ "path_recoveries": 0,
212
+ "noop_fallbacks": 0
213
+ },
214
+ {
215
+ "timestep": 19,
216
+ "chosen_macro_mode": null,
217
+ "planner_scores": null,
218
+ "predicted_reocclusion": null,
219
+ "support_mode_conditioning": true,
220
+ "path_recoveries": 0,
221
+ "noop_fallbacks": 0
222
+ },
223
+ {
224
+ "timestep": 20,
225
+ "chosen_macro_mode": null,
226
+ "planner_scores": null,
227
+ "predicted_reocclusion": null,
228
+ "support_mode_conditioning": true,
229
+ "path_recoveries": 0,
230
+ "noop_fallbacks": 0
231
+ },
232
+ {
233
+ "timestep": 21,
234
+ "chosen_macro_mode": null,
235
+ "planner_scores": null,
236
+ "predicted_reocclusion": null,
237
+ "support_mode_conditioning": true,
238
+ "path_recoveries": 0,
239
+ "noop_fallbacks": 0
240
+ },
241
+ {
242
+ "timestep": 22,
243
+ "chosen_macro_mode": null,
244
+ "planner_scores": null,
245
+ "predicted_reocclusion": null,
246
+ "support_mode_conditioning": true,
247
+ "path_recoveries": 0,
248
+ "noop_fallbacks": 0
249
+ },
250
+ {
251
+ "timestep": 23,
252
+ "chosen_macro_mode": null,
253
+ "planner_scores": null,
254
+ "predicted_reocclusion": null,
255
+ "support_mode_conditioning": true,
256
+ "path_recoveries": 0,
257
+ "noop_fallbacks": 0
258
+ },
259
+ {
260
+ "timestep": 24,
261
+ "chosen_macro_mode": null,
262
+ "planner_scores": null,
263
+ "predicted_reocclusion": null,
264
+ "support_mode_conditioning": true,
265
+ "path_recoveries": 0,
266
+ "noop_fallbacks": 0
267
+ }
268
+ ],
269
+ "success": 0.0,
270
+ "return": 0.0,
271
+ "path_recoveries": 0,
272
+ "noop_fallbacks": 0
273
+ }
274
+ ],
275
+ "mean_success": 0.0,
276
+ "mean_return": 0.0
277
+ }
278
+ },
279
+ "mean_success": 0.0
280
+ }
artifacts/reports/peract2_anchor_smoke_live/bimanual_push_box/command.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ /workspace/envs/rlbench/bin/python -m sim_rlbench.launch_smoke --task bimanual_push_box --resolution 224 --headless
artifacts/reports/peract2_anchor_smoke_live/bimanual_push_box/stderr.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ WARNING:root:not sure how _robot_shapes are used is used.
artifacts/reports/peract2_anchor_smoke_live/bimanual_push_box/stdout.txt ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "display": ":99",
3
+ "headless": true,
4
+ "task": "BimanualPushBox",
5
+ "description": "push the box to the red area",
6
+ "front_rgb_shape": [
7
+ 224,
8
+ 224,
9
+ 3
10
+ ],
11
+ "wrist_left_rgb_shape": [
12
+ 224,
13
+ 224,
14
+ 3
15
+ ],
16
+ "wrist_right_rgb_shape": [
17
+ 224,
18
+ 224,
19
+ 3
20
+ ],
21
+ "right_pose_shape": [
22
+ 7
23
+ ],
24
+ "left_pose_shape": [
25
+ 7
26
+ ],
27
+ "stepped_mode": "bimanual_noop",
28
+ "action_finite": true,
29
+ "action_dim": 18,
30
+ "reward": 0.0,
31
+ "done": false,
32
+ "front_rgb_shape_after_step": [
33
+ 224,
34
+ 224,
35
+ 3
36
+ ]
37
+ }
38
+ [CoppeliaSim:loadinfo] done.
artifacts/reports/proxy_base_reuse128_smoke/scripted/reveal_benchmark.md ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Reveal Proxy Benchmark
2
+
3
+ ## scripted
4
+ - controller: scripted
5
+ - checkpoint: none
6
+ - episodes: 72.000
7
+ - mean_success: 1.000
8
+ - visibility_integral: 1.691
9
+ - corridor_availability: 0.706
10
+ - reocclusion_rate: 0.000
11
+ - disturbance_cost: 0.123
12
+ - premature_retrieve_rate: 0.000
13
+ - reocclusion_after_reveal_rate: 0.000
14
+ - planner_regret: 0.000
15
+ - foliage_success: 1.000
16
+ - bag_success: 1.000
17
+ - cloth_success: 1.000
artifacts/reports/proxy_semantic_heuristic_quick12/active/reveal_benchmark.json ADDED
The diff for this file is too large to render. See raw diff
 
artifacts/reports/proxy_semantic_heuristic_quick12/active/reveal_benchmark.md ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Reveal Proxy Benchmark
2
+
3
+ ## adapter
4
+ - controller: model
5
+ - checkpoint: /workspace/workspace/outputs/adapter_proxy/proxy_adapter_wrapped_clip_base_reuse128_seed17/checkpoint_best.pt
6
+ - episodes: 12.000
7
+ - mean_success: 0.000
8
+ - visibility_integral: 21.449
9
+ - corridor_availability: 0.331
10
+ - reocclusion_rate: 0.001
11
+ - disturbance_cost: 0.397
12
+ - premature_retrieve_rate: 0.000
13
+ - reocclusion_after_reveal_rate: 0.000
14
+ - planner_regret: 0.233
15
+ - foliage_success: 0.000
16
+ - bag_success: 0.000
17
+ - cloth_success: 0.000
artifacts/reports/proxy_semantic_heuristic_quick12/candidate0/reveal_benchmark.md ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Reveal Proxy Benchmark
2
+
3
+ ## adapter
4
+ - controller: candidate0
5
+ - checkpoint: /workspace/workspace/outputs/adapter_proxy/proxy_adapter_wrapped_clip_base_reuse128_seed17/checkpoint_best.pt
6
+ - episodes: 12.000
7
+ - mean_success: 0.000
8
+ - visibility_integral: 2.261
9
+ - corridor_availability: 0.027
10
+ - reocclusion_rate: 0.017
11
+ - disturbance_cost: 0.747
12
+ - premature_retrieve_rate: 0.367
13
+ - reocclusion_after_reveal_rate: 0.167
14
+ - planner_regret: 0.019
15
+ - foliage_success: 0.000
16
+ - bag_success: 0.000
17
+ - cloth_success: 0.000
artifacts/reports/proxy_semantic_heuristic_quick12/noop/reveal_benchmark.json ADDED
The diff for this file is too large to render. See raw diff
 
artifacts/reports/proxy_semantic_heuristic_quick12/noop/reveal_benchmark.md ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Reveal Proxy Benchmark
2
+
3
+ ## adapter
4
+ - controller: model
5
+ - checkpoint: /workspace/workspace/outputs/adapter_proxy/proxy_adapter_wrapped_clip_base_reuse128_seed17/checkpoint_best.pt
6
+ - episodes: 12.000
7
+ - mean_success: 0.000
8
+ - visibility_integral: 2.261
9
+ - corridor_availability: 0.027
10
+ - reocclusion_rate: 0.017
11
+ - disturbance_cost: 0.747
12
+ - premature_retrieve_rate: 0.367
13
+ - reocclusion_after_reveal_rate: 0.167
14
+ - planner_regret: 0.019
15
+ - foliage_success: 0.000
16
+ - bag_success: 0.000
17
+ - cloth_success: 0.000
artifacts/reports/proxy_semantic_heuristic_quick12/oracle/reveal_benchmark.json ADDED
The diff for this file is too large to render. See raw diff
 
artifacts/reports/proxy_semantic_heuristic_quick12/oracle/reveal_benchmark.md ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Reveal Proxy Benchmark
2
+
3
+ ## adapter
4
+ - controller: oracle
5
+ - checkpoint: /workspace/workspace/outputs/adapter_proxy/proxy_adapter_wrapped_clip_base_reuse128_seed17/checkpoint_best.pt
6
+ - episodes: 12.000
7
+ - mean_success: 0.000
8
+ - visibility_integral: 3.338
9
+ - corridor_availability: 0.062
10
+ - reocclusion_rate: 0.018
11
+ - disturbance_cost: 0.707
12
+ - premature_retrieve_rate: 0.575
13
+ - reocclusion_after_reveal_rate: 0.083
14
+ - planner_regret: 0.000
15
+ - foliage_success: 0.000
16
+ - bag_success: 0.000
17
+ - cloth_success: 0.000
artifacts/reports/proxy_semantic_nowm_quick12_final_noop/reveal_benchmark.md ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Reveal Proxy Benchmark
2
+
3
+ ## noop
4
+ - controller: model
5
+ - checkpoint: /workspace/workspace/outputs/adapter_proxy/proxy_adapter_wrapped_clip_base_reuse128_seed17/checkpoint_best.pt
6
+ - episodes: 36.000
7
+ - mean_success: 0.000
8
+ - visibility_integral: 2.275
9
+ - corridor_availability: 0.031
10
+ - reocclusion_rate: 0.021
11
+ - disturbance_cost: 0.743
12
+ - premature_retrieve_rate: 0.362
13
+ - reocclusion_after_reveal_rate: 0.278
14
+ - planner_regret: 0.021
15
+ - foliage_success: 0.000
16
+ - bag_success: 0.000
17
+ - cloth_success: 0.000
artifacts/reports/repaired_dual_push_chunk8_ep1_len25/rollout_eval.md ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # RLBench Rollout Eval
2
+
3
+ - Checkpoint: `/workspace/workspace/VLAarchtests2/outputs/rlbench_dual_push/rlbench_dual_push_backbone_only_clip_chunk8_weighted_seed17/checkpoint_best.pt`
4
+ - Plan requested: `False`
5
+ - Plan applied: `False`
6
+ - Support-mode conditioning: `True`
7
+ - Task conditioning: `True`
8
+ - Geometry enabled: `True`
9
+ - World-model mode: `checkpoint_default`
10
+ - Mean success: `0.000`
11
+
12
+ ## Per-task
13
+
14
+ - `bimanual_dual_push_buttons`: mean_success=0.000, returns=[0.0]
artifacts/reports/repaired_dual_push_chunk8_ep1_len25/rollout_eval.partial.json ADDED
@@ -0,0 +1,280 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "checkpoint": "/workspace/workspace/VLAarchtests2/outputs/rlbench_dual_push/rlbench_dual_push_backbone_only_clip_chunk8_weighted_seed17/checkpoint_best.pt",
3
+ "plan_requested": false,
4
+ "plan_applied": false,
5
+ "planner_mode": "trainable",
6
+ "support_mode_conditioning": true,
7
+ "task_conditioning": true,
8
+ "geometry_enabled": true,
9
+ "world_model_mode": "checkpoint_default",
10
+ "episodes_per_task": 1,
11
+ "episode_length": 25,
12
+ "resolution": 256,
13
+ "reset_retries": 20,
14
+ "arm_mode": "planning",
15
+ "delta_scale": 1.0,
16
+ "cameras": [
17
+ "front",
18
+ "wrist_left",
19
+ "wrist_right"
20
+ ],
21
+ "tasks": {
22
+ "bimanual_dual_push_buttons": {
23
+ "task_class": "BimanualDualPushButtons",
24
+ "successes": [
25
+ 0.0
26
+ ],
27
+ "returns": [
28
+ 0.0
29
+ ],
30
+ "path_recoveries": [
31
+ 0
32
+ ],
33
+ "noop_fallbacks": [
34
+ 0
35
+ ],
36
+ "reset_retries": [
37
+ 0
38
+ ],
39
+ "episode_traces": [
40
+ {
41
+ "language_goal": "push the olive and the orange buttons",
42
+ "steps": [
43
+ {
44
+ "timestep": 0,
45
+ "chosen_macro_mode": null,
46
+ "planner_scores": null,
47
+ "predicted_reocclusion": null,
48
+ "support_mode_conditioning": true,
49
+ "path_recoveries": 0,
50
+ "noop_fallbacks": 0
51
+ },
52
+ {
53
+ "timestep": 1,
54
+ "chosen_macro_mode": null,
55
+ "planner_scores": null,
56
+ "predicted_reocclusion": null,
57
+ "support_mode_conditioning": true,
58
+ "path_recoveries": 0,
59
+ "noop_fallbacks": 0
60
+ },
61
+ {
62
+ "timestep": 2,
63
+ "chosen_macro_mode": null,
64
+ "planner_scores": null,
65
+ "predicted_reocclusion": null,
66
+ "support_mode_conditioning": true,
67
+ "path_recoveries": 0,
68
+ "noop_fallbacks": 0
69
+ },
70
+ {
71
+ "timestep": 3,
72
+ "chosen_macro_mode": null,
73
+ "planner_scores": null,
74
+ "predicted_reocclusion": null,
75
+ "support_mode_conditioning": true,
76
+ "path_recoveries": 0,
77
+ "noop_fallbacks": 0
78
+ },
79
+ {
80
+ "timestep": 4,
81
+ "chosen_macro_mode": null,
82
+ "planner_scores": null,
83
+ "predicted_reocclusion": null,
84
+ "support_mode_conditioning": true,
85
+ "path_recoveries": 0,
86
+ "noop_fallbacks": 0
87
+ },
88
+ {
89
+ "timestep": 5,
90
+ "chosen_macro_mode": null,
91
+ "planner_scores": null,
92
+ "predicted_reocclusion": null,
93
+ "support_mode_conditioning": true,
94
+ "path_recoveries": 0,
95
+ "noop_fallbacks": 0
96
+ },
97
+ {
98
+ "timestep": 6,
99
+ "chosen_macro_mode": null,
100
+ "planner_scores": null,
101
+ "predicted_reocclusion": null,
102
+ "support_mode_conditioning": true,
103
+ "path_recoveries": 0,
104
+ "noop_fallbacks": 0
105
+ },
106
+ {
107
+ "timestep": 7,
108
+ "chosen_macro_mode": null,
109
+ "planner_scores": null,
110
+ "predicted_reocclusion": null,
111
+ "support_mode_conditioning": true,
112
+ "path_recoveries": 0,
113
+ "noop_fallbacks": 0
114
+ },
115
+ {
116
+ "timestep": 8,
117
+ "chosen_macro_mode": null,
118
+ "planner_scores": null,
119
+ "predicted_reocclusion": null,
120
+ "support_mode_conditioning": true,
121
+ "path_recoveries": 0,
122
+ "noop_fallbacks": 0
123
+ },
124
+ {
125
+ "timestep": 9,
126
+ "chosen_macro_mode": null,
127
+ "planner_scores": null,
128
+ "predicted_reocclusion": null,
129
+ "support_mode_conditioning": true,
130
+ "path_recoveries": 0,
131
+ "noop_fallbacks": 0
132
+ },
133
+ {
134
+ "timestep": 10,
135
+ "chosen_macro_mode": null,
136
+ "planner_scores": null,
137
+ "predicted_reocclusion": null,
138
+ "support_mode_conditioning": true,
139
+ "path_recoveries": 0,
140
+ "noop_fallbacks": 0
141
+ },
142
+ {
143
+ "timestep": 11,
144
+ "chosen_macro_mode": null,
145
+ "planner_scores": null,
146
+ "predicted_reocclusion": null,
147
+ "support_mode_conditioning": true,
148
+ "path_recoveries": 0,
149
+ "noop_fallbacks": 0
150
+ },
151
+ {
152
+ "timestep": 12,
153
+ "chosen_macro_mode": null,
154
+ "planner_scores": null,
155
+ "predicted_reocclusion": null,
156
+ "support_mode_conditioning": true,
157
+ "path_recoveries": 0,
158
+ "noop_fallbacks": 0
159
+ },
160
+ {
161
+ "timestep": 13,
162
+ "chosen_macro_mode": null,
163
+ "planner_scores": null,
164
+ "predicted_reocclusion": null,
165
+ "support_mode_conditioning": true,
166
+ "path_recoveries": 0,
167
+ "noop_fallbacks": 0
168
+ },
169
+ {
170
+ "timestep": 14,
171
+ "chosen_macro_mode": null,
172
+ "planner_scores": null,
173
+ "predicted_reocclusion": null,
174
+ "support_mode_conditioning": true,
175
+ "path_recoveries": 0,
176
+ "noop_fallbacks": 0
177
+ },
178
+ {
179
+ "timestep": 15,
180
+ "chosen_macro_mode": null,
181
+ "planner_scores": null,
182
+ "predicted_reocclusion": null,
183
+ "support_mode_conditioning": true,
184
+ "path_recoveries": 0,
185
+ "noop_fallbacks": 0
186
+ },
187
+ {
188
+ "timestep": 16,
189
+ "chosen_macro_mode": null,
190
+ "planner_scores": null,
191
+ "predicted_reocclusion": null,
192
+ "support_mode_conditioning": true,
193
+ "path_recoveries": 0,
194
+ "noop_fallbacks": 0
195
+ },
196
+ {
197
+ "timestep": 17,
198
+ "chosen_macro_mode": null,
199
+ "planner_scores": null,
200
+ "predicted_reocclusion": null,
201
+ "support_mode_conditioning": true,
202
+ "path_recoveries": 0,
203
+ "noop_fallbacks": 0
204
+ },
205
+ {
206
+ "timestep": 18,
207
+ "chosen_macro_mode": null,
208
+ "planner_scores": null,
209
+ "predicted_reocclusion": null,
210
+ "support_mode_conditioning": true,
211
+ "path_recoveries": 0,
212
+ "noop_fallbacks": 0
213
+ },
214
+ {
215
+ "timestep": 19,
216
+ "chosen_macro_mode": null,
217
+ "planner_scores": null,
218
+ "predicted_reocclusion": null,
219
+ "support_mode_conditioning": true,
220
+ "path_recoveries": 0,
221
+ "noop_fallbacks": 0
222
+ },
223
+ {
224
+ "timestep": 20,
225
+ "chosen_macro_mode": null,
226
+ "planner_scores": null,
227
+ "predicted_reocclusion": null,
228
+ "support_mode_conditioning": true,
229
+ "path_recoveries": 0,
230
+ "noop_fallbacks": 0
231
+ },
232
+ {
233
+ "timestep": 21,
234
+ "chosen_macro_mode": null,
235
+ "planner_scores": null,
236
+ "predicted_reocclusion": null,
237
+ "support_mode_conditioning": true,
238
+ "path_recoveries": 0,
239
+ "noop_fallbacks": 0
240
+ },
241
+ {
242
+ "timestep": 22,
243
+ "chosen_macro_mode": null,
244
+ "planner_scores": null,
245
+ "predicted_reocclusion": null,
246
+ "support_mode_conditioning": true,
247
+ "path_recoveries": 0,
248
+ "noop_fallbacks": 0
249
+ },
250
+ {
251
+ "timestep": 23,
252
+ "chosen_macro_mode": null,
253
+ "planner_scores": null,
254
+ "predicted_reocclusion": null,
255
+ "support_mode_conditioning": true,
256
+ "path_recoveries": 0,
257
+ "noop_fallbacks": 0
258
+ },
259
+ {
260
+ "timestep": 24,
261
+ "chosen_macro_mode": null,
262
+ "planner_scores": null,
263
+ "predicted_reocclusion": null,
264
+ "support_mode_conditioning": true,
265
+ "path_recoveries": 0,
266
+ "noop_fallbacks": 0
267
+ }
268
+ ],
269
+ "success": 0.0,
270
+ "return": 0.0,
271
+ "path_recoveries": 0,
272
+ "noop_fallbacks": 0
273
+ }
274
+ ],
275
+ "mean_success": 0.0,
276
+ "mean_return": 0.0
277
+ }
278
+ },
279
+ "mean_success": 0.0
280
+ }
artifacts/reports/repaired_dual_push_chunk8_ep3/rollout_eval.json ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "checkpoint": "/workspace/workspace/VLAarchtests2/outputs/rlbench_dual_push/rlbench_dual_push_backbone_only_clip_chunk8_weighted_seed17/checkpoint_best.pt",
3
+ "plan_requested": false,
4
+ "plan_applied": false,
5
+ "planner_mode": "trainable",
6
+ "support_mode_conditioning": true,
7
+ "task_conditioning": true,
8
+ "geometry_enabled": true,
9
+ "world_model_mode": "checkpoint_default",
10
+ "episodes_per_task": 3,
11
+ "episode_length": 120,
12
+ "resolution": 256,
13
+ "reset_retries": 20,
14
+ "arm_mode": "planning",
15
+ "delta_scale": 1.0,
16
+ "cameras": [
17
+ "front",
18
+ "wrist_left",
19
+ "wrist_right"
20
+ ],
21
+ "tasks": {
22
+ "bimanual_dual_push_buttons": {
23
+ "error": "The call failed on the V-REP side. Return value: -1",
24
+ "mean_success": 0.0,
25
+ "mean_return": 0.0
26
+ }
27
+ },
28
+ "mean_success": 0.0
29
+ }
artifacts/reports/repaired_dual_push_chunk8_ep3/rollout_eval.md ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # RLBench Rollout Eval
2
+
3
+ - Checkpoint: `/workspace/workspace/VLAarchtests2/outputs/rlbench_dual_push/rlbench_dual_push_backbone_only_clip_chunk8_weighted_seed17/checkpoint_best.pt`
4
+ - Plan requested: `False`
5
+ - Plan applied: `False`
6
+ - Support-mode conditioning: `True`
7
+ - Task conditioning: `True`
8
+ - Geometry enabled: `True`
9
+ - World-model mode: `checkpoint_default`
10
+ - Mean success: `0.000`
11
+
12
+ ## Per-task
13
+
14
+ - `bimanual_dual_push_buttons`: error=The call failed on the V-REP side. Return value: -1
artifacts/reports/repaired_dual_push_chunk8_ep3/rollout_eval.partial.json ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "checkpoint": "/workspace/workspace/VLAarchtests2/outputs/rlbench_dual_push/rlbench_dual_push_backbone_only_clip_chunk8_weighted_seed17/checkpoint_best.pt",
3
+ "plan_requested": false,
4
+ "plan_applied": false,
5
+ "planner_mode": "trainable",
6
+ "support_mode_conditioning": true,
7
+ "task_conditioning": true,
8
+ "geometry_enabled": true,
9
+ "world_model_mode": "checkpoint_default",
10
+ "episodes_per_task": 3,
11
+ "episode_length": 120,
12
+ "resolution": 256,
13
+ "reset_retries": 20,
14
+ "arm_mode": "planning",
15
+ "delta_scale": 1.0,
16
+ "cameras": [
17
+ "front",
18
+ "wrist_left",
19
+ "wrist_right"
20
+ ],
21
+ "tasks": {
22
+ "bimanual_dual_push_buttons": {
23
+ "error": "The call failed on the V-REP side. Return value: -1",
24
+ "mean_success": 0.0,
25
+ "mean_return": 0.0
26
+ }
27
+ },
28
+ "mean_success": 0.0
29
+ }
docs/CHANGE_AND_TEST_LOG.md ADDED
@@ -0,0 +1,221 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Change And Test Log
2
+
3
+ This file records the main code changes and executed test commands copied into this repo. Result statements below are raw command outcomes only.
4
+
5
+ ## Previous Repo Work Included Here
6
+
7
+ Copied from `history/VLAarchtests_previous_README.md`:
8
+
9
+ - core model, memory, planner, and dataset changes under:
10
+ - `VLAarchtests/code/reveal_vla_bimanual/models/`
11
+ - `VLAarchtests/code/reveal_vla_bimanual/train/losses.py`
12
+ - `VLAarchtests/code/reveal_vla_bimanual/sim_reveal/`
13
+ - `VLAarchtests/code/reveal_vla_bimanual/sim_rlbench/dataset.py`
14
+ - training and eval paths under:
15
+ - `VLAarchtests/code/reveal_vla_bimanual/train/`
16
+ - `VLAarchtests/code/reveal_vla_bimanual/eval/`
17
+ - earlier test suite under:
18
+ - `VLAarchtests/tests/`
19
+
20
+ ## Current Session File Changes
21
+
22
+ ### Core reveal/proxy path
23
+
24
+ - `VLAarchtests/code/reveal_vla_bimanual/models/policy.py`
25
+ - `VLAarchtests/code/reveal_vla_bimanual/models/action_decoder.py`
26
+ - `VLAarchtests/code/reveal_vla_bimanual/models/backbones.py`
27
+ - `VLAarchtests/code/reveal_vla_bimanual/models/rvt_backbone.py`
28
+ - `VLAarchtests/code/reveal_vla_bimanual/train/losses.py`
29
+ - `VLAarchtests/code/reveal_vla_bimanual/train/run_rlbench_experiment.py`
30
+ - `VLAarchtests/code/reveal_vla_bimanual/eval/run_reveal_benchmark.py`
31
+ - `VLAarchtests/code/reveal_vla_bimanual/eval/summarize_anybimanual_overlap_eval.py`
32
+ - `VLAarchtests/code/reveal_vla_bimanual/eval/summarize_rvt_overlap_branch.py`
33
+ - `VLAarchtests/code/reveal_vla_bimanual/eval/compose_task_routed_proxy_summary.py`
34
+ - `VLAarchtests/code/reveal_vla_bimanual/eval/run_proposal_alignment_diagnostics.py`
35
+ - `VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_knn_eval.py`
36
+ - `VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_knn_task_sweep.py`
37
+ - `VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_dual_push_retargeted_demo_eval.py`
38
+ - `VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_dual_push_full_arch_hybrid_eval.py`
39
+ - `VLAarchtests/code/reveal_vla_bimanual/eval/dual_push_retarget_utils.py`
40
+ - `VLAarchtests/code/reveal_vla_bimanual/eval/dual_push_full_arch_utils.py`
41
+ - `VLAarchtests/code/reveal_vla_bimanual/sim_reveal/build_task_specialized_episode_specs.py`
42
+ - `VLAarchtests/code/reveal_vla_bimanual/sim_reveal/procedural_envs.py`
43
+ - `VLAarchtests/code/reveal_vla_bimanual/sim_rlbench/task_resolver.py`
44
+
45
+ ### Training/eval wrappers and configs
46
+
47
+ - `VLAarchtests/code/reveal_vla_bimanual/scripts/run_task_routed_proxy_eval.sh`
48
+ - `VLAarchtests/code/reveal_vla_bimanual/scripts/run_bag_selector_iter9.sh`
49
+ - `VLAarchtests/code/reveal_vla_bimanual/scripts/run_anybimanual_subset3_overlap_train.sh`
50
+ - `VLAarchtests/code/reveal_vla_bimanual/scripts/run_anybimanual_subset3_overlap_eval.sh`
51
+ - `VLAarchtests/code/reveal_vla_bimanual/scripts/run_rvt_overlap_branch.sh`
52
+ - `VLAarchtests/code/reveal_vla_bimanual/scripts/run_dual_push_retargeted_demo_eval.sh`
53
+ - `VLAarchtests/code/reveal_vla_bimanual/scripts/run_dual_push_full_arch_hybrid_eval.sh`
54
+ - `VLAarchtests/code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_v7_selector_finetune_iter6.yaml`
55
+ - `VLAarchtests/code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_v7_selector_finetune_iter7.yaml`
56
+ - `VLAarchtests/code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_v7_selector_finetune_iter8.yaml`
57
+ - `VLAarchtests/code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_v7_selector_finetune_iter9_bag.yaml`
58
+ - `VLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_subset3_backbone_only_clip_100demo_fair_step1_full.yaml`
59
+ - `VLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_subset3_backbone_only_rvt_100demo_frozen_seed17.yaml`
60
+ - `VLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_subset3_backbone_only_rvt_100demo_unfreeze_top2_seed17.yaml`
61
+ - `VLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_dual_push_backbone_only_clip_finetune_seed17.yaml`
62
+ - `VLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_dual_push_backbone_only_clip_finetune_weighted_seed17.yaml`
63
+ - `VLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_dual_push_backbone_only_clip_chunk8_weighted_seed17.yaml`
64
+ - `environment/reconstruct_anybimanual_overlap_replay.sh`
65
+
66
+ ### Test additions or updates
67
+
68
+ - `VLAarchtests/tests/test_eval_toggle_paths_work.py`
69
+ - `VLAarchtests/tests/test_task_routed_model_eval.py`
70
+ - `VLAarchtests/tests/test_anybimanual_resume_logic.py`
71
+ - `VLAarchtests/tests/test_anybimanual_overlap_eval_summary.py`
72
+ - `VLAarchtests/tests/test_candidate_ranking_loss.py`
73
+ - `VLAarchtests/tests/test_compose_task_routed_proxy_summary.py`
74
+ - `VLAarchtests/tests/test_build_task_specialized_episode_specs.py`
75
+ - `VLAarchtests/tests/test_proposal_mode_names_label_base_action.py`
76
+ - `VLAarchtests/tests/test_proxy_scripted_bench.py`
77
+ - `VLAarchtests/tests/test_rvt_backbone_forward.py`
78
+ - `VLAarchtests/tests/test_rlbench_dataset_rgbd_geometry.py`
79
+ - `VLAarchtests/tests/test_rlbench_init_checkpoint.py`
80
+ - `VLAarchtests/tests/test_rlbench_pickle_bootstrap.py`
81
+ - `VLAarchtests/tests/test_rlbench_task_resolver_aliases.py`
82
+ - `VLAarchtests/tests/test_summarize_rvt_overlap_branch.py`
83
+ - `VLAarchtests/tests/test_dual_push_retarget_utils.py`
84
+ - `VLAarchtests/tests/test_dual_push_full_arch_utils.py`
85
+
86
+ ### Third-party baseline path changes
87
+
88
+ - `third_party/AnyBimanual/third_party/YARR/yarr/runners/offline_train_runner.py`
89
+ - `third_party/AnyBimanual/third_party/YARR/yarr/runners/weight_init_utils.py`
90
+ - `third_party/AnyBimanual/agents/peract_bc/launch_utils.py`
91
+ - `third_party/AnyBimanual/agents/peract_bc/qattention_peract_bc_agent.py`
92
+ - `third_party/AnyBimanual/agents/peract_bimanual/qattention_peract_bc_agent.py`
93
+
94
+ ## Current Session Test Commands
95
+
96
+ Executed commands recorded in the workspace:
97
+
98
+ - `python -m py_compile /workspace/VLAarchtests/code/reveal_vla_bimanual/models/action_decoder.py /workspace/VLAarchtests/tests/test_proposal_mode_names_label_base_action.py`
99
+ - `PYTHONPATH=/workspace/VLAarchtests/code/reveal_vla_bimanual pytest -q /workspace/VLAarchtests/tests/test_proposal_mode_names_label_base_action.py /workspace/VLAarchtests/tests/test_candidate_ranking_loss.py /workspace/VLAarchtests/tests/test_compose_task_routed_proxy_summary.py /workspace/VLAarchtests/tests/test_build_task_specialized_episode_specs.py`
100
+ - result: `11 passed`
101
+ - `pytest -q /workspace/VLAarchtests/tests/test_anybimanual_overlap_eval_summary.py`
102
+ - result: `2 passed`
103
+ - `pytest -q /workspace/VLAarchtests/tests/test_task_routed_model_eval.py /workspace/VLAarchtests/tests/test_eval_toggle_paths_work.py`
104
+ - result: `4 passed`
105
+ - `pytest -q /workspace/VLAarchtests/tests/test_rvt_backbone_forward.py /workspace/VLAarchtests/tests/test_rlbench_dataset_rgbd_geometry.py /workspace/VLAarchtests/tests/test_eval_toggle_paths_work.py /workspace/VLAarchtests/tests/test_rlbench_init_checkpoint.py /workspace/VLAarchtests/tests/test_rlbench_pickle_bootstrap.py /workspace/VLAarchtests/tests/test_rlbench_task_resolver_aliases.py /workspace/VLAarchtests/tests/test_summarize_rvt_overlap_branch.py`
106
+ - result: `passed`
107
+ - `pytest -q /workspace/VLAarchtests/tests/test_build_task_specialized_episode_specs.py /workspace/VLAarchtests/tests/test_candidate_ranking_loss.py /workspace/VLAarchtests/tests/test_compose_task_routed_proxy_summary.py`
108
+ - result: `10 passed`
109
+ - `pytest -q /workspace/VLAarchtests/tests/test_dual_push_retarget_utils.py /workspace/VLAarchtests/tests/test_rlbench_knn_eval_scene_kwargs.py`
110
+ - result: `passed`
111
+ - `pytest -q /workspace/VLAarchtests/tests/test_dual_push_retarget_utils.py`
112
+ - result: `6 passed`
113
+ - `pytest -q /workspace/VLAarchtests/tests/test_dual_push_retarget_utils.py /workspace/VLAarchtests/tests/test_dual_push_full_arch_utils.py`
114
+ - result: `9 passed`
115
+ - `bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_bag_selector_iter9.sh`
116
+ - `bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_task_routed_proxy_eval.sh`
117
+ - `bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_anybimanual_subset3_overlap_train.sh`
118
+ - `bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_rvt_overlap_branch.sh`
119
+ - `bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_dual_push_retargeted_demo_eval.sh`
120
+ - `bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_dual_push_full_arch_hybrid_eval.sh`
121
+ - `PYTHONPATH=/workspace/third_party/AnyBimanual/third_party/YARR pytest -q /workspace/VLAarchtests/tests/test_anybimanual_resume_logic.py`
122
+ - result: `4 passed`
123
+ - `python -m py_compile /workspace/VLAarchtests/code/reveal_vla_bimanual/models/rvt_backbone.py /workspace/VLAarchtests/code/reveal_vla_bimanual/train/run_rlbench_experiment.py /workspace/VLAarchtests/code/reveal_vla_bimanual/sim_rlbench/dataset.py /workspace/VLAarchtests/code/reveal_vla_bimanual/sim_rlbench/task_resolver.py /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/summarize_rvt_overlap_branch.py`
124
+ - result: `passed`
125
+ - `python -m py_compile /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/dual_push_retarget_utils.py /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_dual_push_retargeted_demo_eval.py`
126
+ - result: `passed`
127
+ - `python -m py_compile /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/dual_push_retarget_utils.py /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/dual_push_full_arch_utils.py /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_dual_push_full_arch_hybrid_eval.py`
128
+ - result: `passed`
129
+
130
+ ## Current Session Generated Reports
131
+
132
+ Current-session report roots staged in this repo:
133
+
134
+ - `VLAarchtests/artifacts/reports/sprint_v7_summary/`
135
+ - `VLAarchtests/artifacts/reports/sprint_v7_followup/`
136
+ - `VLAarchtests/artifacts/reports/selector_finetune_v7_iterations/`
137
+ - `VLAarchtests/artifacts/reports/selector_finetune_v7_iter6/`
138
+ - `VLAarchtests/artifacts/reports/selector_finetune_v7_iter7/`
139
+ - `VLAarchtests/artifacts/reports/selector_finetune_v7_iter8/`
140
+ - `VLAarchtests/artifacts/reports/task_routed_proxy_v1/`
141
+ - `VLAarchtests/artifacts/reports/rlbench_general_debug_20260330/`
142
+ - `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/`
143
+ - `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/resume1000_summary/`
144
+ - `VLAarchtests/artifacts/reports/bag_mode_specialization_20260330/`
145
+ - `VLAarchtests/artifacts/reports/general_task_anchor_20260330_dual_push_buttons/`
146
+ - `VLAarchtests/artifacts/reports/rvt_overlap_branch_20260330/`
147
+ - `VLAarchtests/artifacts/reports/rvt_overlap_branch_fixedbounds_20260330/`
148
+ - `VLAarchtests/artifacts/reports/dual_push_nonzero_branch_20260330/`
149
+ - `VLAarchtests/artifacts/reports/dual_push_full_arch_hybrid_20260331/`
150
+
151
+ ## HF Packaging Notes
152
+
153
+ Raw packaging changes applied to the staged HF export:
154
+
155
+ - `baselines/AnyBimanual_overlap_replay/multi/` was reshaped from one flat directory into shard subdirectories:
156
+ - `00000-04999/`
157
+ - `05000-09999/`
158
+ - `10000-14999/`
159
+ - file count after reshape: `14034`
160
+ - reconstruction helper added at:
161
+ - `environment/reconstruct_anybimanual_overlap_replay.sh`
162
+ - exact rejected Hub error before reshape:
163
+ - `Your push was rejected because it contains too many files per directory. Each directory in your git repo can only contain up to 10000 files. Offending directories: /baselines/AnyBimanual_overlap_replay/multi/`
164
+
165
+ ## Current Session Logs
166
+
167
+ Main logs staged in this repo:
168
+
169
+ - `reports/anybimanual_subset3_overlap_smoke200_fixpretrain_nowandb3_train.log`
170
+ - `reports/anybimanual_subset3_overlap_smoke200_fixpretrain_nowandb3_train_presavefix.log`
171
+ - `reports/anybimanual_subset3_overlap_resume1000_eval.log`
172
+ - `reports/anybimanual_subset3_overlap_resume1000_summary.log`
173
+ - `reports/task_routed_proxy_v1_rerun.log`
174
+ - `reports/run_bag_selector_iter9_prebuild.log`
175
+ - `reports/anybimanual_release_subset3_eval_ep5.log`
176
+ - `reports/rvt_overlap_branch_fixedbounds_20260330_chain.sh`
177
+ - `reports/dual_push_full_arch_hybrid_iter6_scene_ep5.log`
178
+ - `reports/dual_push_full_arch_hybrid_iter6_backbone_ep2_r005.log`
179
+
180
+ ## Official Overlap Eval Final Raw Outputs
181
+
182
+ Sources:
183
+
184
+ - `reports/anybimanual_subset3_overlap_resume1000_eval.log`
185
+ - `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/resume1000_summary/summary.json`
186
+
187
+ Raw values:
188
+
189
+ - step `1000`
190
+ - local mean success `0.16`
191
+ - `coordinated_push_box`: success `0.0`, return `0.0`
192
+ - `coordinated_lift_ball`: success `0.0`, return `0.0`
193
+ - `dual_push_buttons`: success `0.48`, return `12.0`
194
+
195
+ ## General-Task Anchor Raw Outputs
196
+
197
+ Sources:
198
+
199
+ - `VLAarchtests/artifacts/reports/general_task_anchor_20260330_dual_push_buttons/summary.json`
200
+
201
+ Raw values:
202
+
203
+ - public AnyBimanual release, step `60000`: success `0.96`, return `24.0`, length `21.56`
204
+ - local official single-task eval, step `60000`, `25` episodes: success `0.96`, return `24.0`, length `21.84`
205
+ - local clip backbone-only result: success `0.0`, return `0.0`
206
+ - local elastic reveal proxy iter6 result: success `0.0`, return `0.0`
207
+ - local RVT frozen fixed-bounds result: success `0.0`, return `0.0`
208
+
209
+ ## Dual-Push Branch Raw Outputs
210
+
211
+ Sources:
212
+
213
+ - `VLAarchtests/artifacts/reports/dual_push_nonzero_branch_20260330/summary.md`
214
+ - `VLAarchtests/artifacts/reports/dual_push_full_arch_hybrid_20260331/summary.md`
215
+
216
+ Raw values:
217
+
218
+ - demo replay through `absolute_action_from_delta`: mean success `0.8`, mean return `0.8`
219
+ - retargeted demo with checkpoint backbone retrieval and vision-only button localization, `5` episodes: mean success `1.0`, mean return `1.0`
220
+ - elastic checkpoint retargeted-demo probe with scene retrieval and vision-only button localization, `1` episode: mean success `1.0`, mean return `1.0`
221
+ - full-architecture hybrid eval with elastic controller checkpoint plus dual-push retrieval checkpoint, `1` episode: mean success `1.0`, mean return `1.0`, steps `116`, path recoveries `0`, noop fallbacks `0`
docs/MODEL_AND_ARTIFACT_INDEX.md ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Model And Artifact Index
2
+
3
+ Main staged roots:
4
+
5
+ - `VLAarchtests/code/reveal_vla_bimanual/`
6
+ - `VLAarchtests/tests/`
7
+ - `VLAarchtests/artifacts/`
8
+ - `third_party/AnyBimanual/`
9
+ - `baselines/`
10
+ - `outputs/`
11
+ - `reports/`
12
+ - `handoff/instructions4.md`
13
+ - `history/VLAarchtests_previous_README.md`
14
+
15
+ Key current-session report roots:
16
+
17
+ - `VLAarchtests/artifacts/reports/sprint_v7_summary/`
18
+ - `VLAarchtests/artifacts/reports/sprint_v7_followup/`
19
+ - `VLAarchtests/artifacts/reports/selector_finetune_v7_iterations/`
20
+ - `VLAarchtests/artifacts/reports/selector_finetune_v7_iter6/`
21
+ - `VLAarchtests/artifacts/reports/selector_finetune_v7_iter7/`
22
+ - `VLAarchtests/artifacts/reports/selector_finetune_v7_iter8/`
23
+ - `VLAarchtests/artifacts/reports/task_routed_proxy_v1/`
24
+ - `VLAarchtests/artifacts/reports/rlbench_general_debug_20260330/`
25
+ - `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/`
26
+ - `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/resume1000_summary/`
27
+ - `VLAarchtests/artifacts/reports/bag_mode_specialization_20260330/`
28
+ - `VLAarchtests/artifacts/reports/general_task_anchor_20260330_dual_push_buttons/`
29
+ - `VLAarchtests/artifacts/reports/rvt_overlap_branch_20260330/`
30
+ - `VLAarchtests/artifacts/reports/rvt_overlap_branch_fixedbounds_20260330/`
31
+ - `VLAarchtests/artifacts/reports/dual_push_nonzero_branch_20260330/`
32
+ - `VLAarchtests/artifacts/reports/dual_push_full_arch_hybrid_20260331/`
33
+
34
+ Key current-session run/log roots:
35
+
36
+ - `baselines/AnyBimanual_overlap_runs/peract_bc_subset3_overlap_smoke200_fixpretrain_nowandb3/`
37
+ - `baselines/AnyBimanual_release_eval_anchor/perlf_release_dual_push_buttons_ep25/`
38
+ - `baselines/AnyBimanual_overlap_replay/`
39
+ - `outputs/rlbench_true_baselines/`
40
+ - `outputs/rlbench_dual_push/`
41
+ - `outputs/rlbench_rvt_branch/`
42
+ - `reports/anybimanual_subset3_overlap_resume1000_eval.log`
43
+ - `reports/anybimanual_subset3_overlap_resume1000_summary.log`
44
+ - `reports/anybimanual_release_subset3_eval_ep5.log`
45
+ - `reports/dual_push_full_arch_probe_iter6_scene_ep1/`
46
+ - `reports/dual_push_full_arch_hybrid_iter6_backbone_ep1/`
47
+ - `reports/dual_push_nonzero_branch_20260330/`
48
+ - `reports/run_bag_selector_iter9_prebuild.log`
49
+ - `reports/task_routed_proxy_v1_rerun.log`
50
+ - `environment/reconstruct_anybimanual_overlap_replay.sh`
51
+
52
+ Key final official overlap summary files:
53
+
54
+ - `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/resume1000_summary/summary.md`
55
+ - `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/resume1000_summary/summary.json`
56
+
57
+ HF export packaging note:
58
+
59
+ - `baselines/AnyBimanual_overlap_replay/multi/` is sharded into subdirectories in this repo copy.
docs/RESULTS_RAW.md ADDED
@@ -0,0 +1,178 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Results Raw
2
+
3
+ This file records exact values and exact partial statuses without additional conclusions.
4
+
5
+ ## Proxy Sprint v7 Main Table
6
+
7
+ Source:
8
+
9
+ - `VLAarchtests/artifacts/reports/sprint_v7_summary/reveal_sprint_summary_compact.json`
10
+
11
+ | Item | Raw values |
12
+ | --- | --- |
13
+ | base_model | mean success `0.28`; foliage `0.39`; bag `0.31`; cloth `0.14` |
14
+ | random | mean success `0.43333333333333335`; foliage `0.41`; bag `0.37`; cloth `0.52` |
15
+ | candidate0 | mean success `0.2`; foliage `0.24`; bag `0.22`; cloth `0.14` |
16
+ | oracle | mean success `0.4066666666666667`; foliage `0.5`; bag `0.42`; cloth `0.3` |
17
+ | scripted | mean success `1.0`; foliage `1.0`; bag `1.0`; cloth `1.0` |
18
+
19
+ ## Proxy Sprint v7 Ablation Table
20
+
21
+ Source:
22
+
23
+ - `VLAarchtests/artifacts/reports/sprint_v7_summary/reveal_sprint_summary_compact.json`
24
+
25
+ | Item | Raw values |
26
+ | --- | --- |
27
+ | no_planner | `0.2` |
28
+ | no_memory | `0.3233333333333333` |
29
+ | no_task_conditioning | `0.28` |
30
+ | no_geometry | `0.27` |
31
+ | no_camera_pose | `0.29333333333333333` |
32
+
33
+ ## Selector Table
34
+
35
+ Sources:
36
+
37
+ - `VLAarchtests/artifacts/reports/selector_finetune_v7_iter6/default/reveal_benchmark.json`
38
+ - `VLAarchtests/artifacts/reports/selector_finetune_v7_iter7/full_fixed_default/reveal_benchmark.json`
39
+ - `VLAarchtests/artifacts/reports/selector_finetune_v7_iter8/bag_fixed_default/reveal_benchmark.json`
40
+ - `VLAarchtests/artifacts/reports/task_routed_proxy_v1/summary.md`
41
+
42
+ | Item | Raw values |
43
+ | --- | --- |
44
+ | iter6 | mean success `0.4566666666666667`; foliage `0.46`; bag `0.4`; cloth `0.51` |
45
+ | iter7 | mean success `0.4666666666666666`; foliage `0.4`; bag `0.41`; cloth `0.59` |
46
+ | iter8 bag fixed slice | mean success `0.41`; nominal `0.45`; high_reocclusion `0.4`; camera_perturbation `0.5`; one_sided_slip `0.25` |
47
+ | routed controller | mean success `0.48666666666666664`; route `foliage -> iter6`, `bag -> iter8`, `cloth -> iter8`; foliage `0.46`; bag `0.41`; cloth `0.59` |
48
+
49
+ ## Proxy Baseline Compare Table
50
+
51
+ Source:
52
+
53
+ - `VLAarchtests/artifacts/reports/real_baseline_compare_v7_full/reveal_benchmark.json`
54
+
55
+ | Item | Raw values |
56
+ | --- | --- |
57
+ | baseline_rgbd_stage3 | mean success `0.31`; foliage `0.21`; bag `0.15`; cloth `0.57` |
58
+ | iter5_selector | mean success `0.45`; foliage `0.44`; bag `0.4`; cloth `0.51` |
59
+
60
+ ## RLBench Recovered Push-Box Comparator
61
+
62
+ Sources:
63
+
64
+ - `reports/rlbench_general_debug/rlbench_push_box_fair_step1_final_knn_ep10_x99_res224_len180_train80_fixed/bimanual_push_box/rollout_eval.json`
65
+ - `reports/rlbench_general_debug/rlbench_push_box_historical_step1_knn_ep10_x99_res224_len180_train80_fixed/bimanual_push_box/rollout_eval.json`
66
+
67
+ | Item | Raw values |
68
+ | --- | --- |
69
+ | current fair-step1 final | mean success `0.7`; mean return `0.7`; successes `[1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0]` |
70
+ | historical push-box control | mean success `0.4`; mean return `0.4`; successes `[0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0]` |
71
+
72
+ ## Official AnyBimanual Overlap Training Milestones
73
+
74
+ Sources:
75
+
76
+ - `baselines/AnyBimanual_overlap_runs/peract_bc_subset3_overlap_smoke200_fixpretrain_nowandb3/PERACT_BC/seed0/training.log`
77
+ - `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/status.md`
78
+
79
+ | Global step | Raw values |
80
+ | --- | --- |
81
+ | 300 | loss `40.91718`; sample time `0.093029`; step time `14.0686` |
82
+ | 400 | loss `33.26684`; sample time `0.073085`; step time `14.3032` |
83
+ | 500 | loss `36.07054`; sample time `0.048558`; step time `11.1376` |
84
+ | 600 | loss `35.32345`; sample time `0.040642`; step time `9.7719` |
85
+ | 700 | loss `28.50959`; sample time `0.057937`; step time `10.9347` |
86
+ | 800 | loss `23.60169`; sample time `0.032697`; step time `11.8652` |
87
+ | 900 | loss `15.28901`; sample time `0.051232`; step time `11.5073` |
88
+ | 1000 checkpoint | train reached `weights/1000` and exited cleanly |
89
+
90
+ ## Official AnyBimanual Overlap Eval Final Output
91
+
92
+ Sources:
93
+
94
+ - `reports/anybimanual_subset3_overlap_resume1000_eval.log`
95
+ - `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/resume1000_summary/summary.json`
96
+
97
+ | Item | Raw values |
98
+ | --- | --- |
99
+ | local last complete step | `1000` |
100
+ | local mean success | `0.16` |
101
+ | coordinated_push_box | success `0.0`; return `0.0`; final score log line `0.0` |
102
+ | coordinated_lift_ball | success `0.0`; return `0.0`; final score log line `0.0` |
103
+ | dual_push_buttons | success `0.48`; return `12.0`; final score log line `12.0` |
104
+ | public best overlap step in local summary | step `60000`; mean success `0.6933333333333334` |
105
+ | public best overlap per-task success | coordinated_push_box `0.8`; coordinated_lift_ball `0.32`; dual_push_buttons `0.96` |
106
+ | delta vs public best mean success | `-0.5333333333333333` |
107
+ | delta vs public best per-task success | coordinated_push_box `-0.8`; coordinated_lift_ball `-0.32`; dual_push_buttons `-0.48` |
108
+
109
+ ## Validated General-Task Anchor: dual_push_buttons
110
+
111
+ Source:
112
+
113
+ - `VLAarchtests/artifacts/reports/general_task_anchor_20260330_dual_push_buttons/summary.json`
114
+
115
+ | Item | Raw values |
116
+ | --- | --- |
117
+ | public AnyBimanual release | step `60000`; success `0.96`; return `24.0`; length `21.56` |
118
+ | local official single-task eval | step `60000`; episodes `25`; success `0.96`; return `24.0`; length `21.84` |
119
+ | local clip backbone-only | success `0.0`; return `0.0`; path `reports/true_baseline_compare_subset3_v1/rlbench_subset3_backbone_only_clip_100demo_fair_seed17_noplan_split/bimanual_dual_push_buttons/rollout_eval.json` |
120
+ | local elastic reveal proxy iter6 | success `0.0`; return `0.0`; path `reports/true_baseline_compare_subset3_v1/rlbench_subset3_elastic_reveal_proxy_iter6_100demo_fair_seed17_noplan_split/bimanual_dual_push_buttons/rollout_eval.json` |
121
+ | local RVT hybrid frozen fixed-bounds | success `0.0`; return `0.0`; path `reports/rvt_overlap_branch_fixedbounds_20260330/evals/rlbench_subset3_backbone_only_rvt_100demo_frozen_fixedbounds_seed17_noplan_split/bimanual_dual_push_buttons/rollout_eval.json` |
122
+
123
+ ## RVT Overlap Branch
124
+
125
+ Sources:
126
+
127
+ - `VLAarchtests/artifacts/reports/rvt_overlap_branch_20260330/status.md`
128
+ - `VLAarchtests/artifacts/reports/rvt_overlap_branch_20260330/summary.md`
129
+ - `VLAarchtests/artifacts/reports/rvt_overlap_branch_fixedbounds_20260330/summary.md`
130
+
131
+ | Item | Raw values |
132
+ | --- | --- |
133
+ | frozen RVT stage1 train | checkpoint `outputs/rlbench_rvt_branch/rlbench_subset3_backbone_only_rvt_100demo_frozen_seed17/checkpoint_best.pt`; final train total `0.043179353826920445`; final val total `0.039591669984665984`; train seconds `2261.2839448451996` |
134
+ | frozen RVT overlap eval | mean success `0.0`; push_box `0.0`; lift_ball `0.0`; dual_push_buttons `0.0` |
135
+ | frozen fixed-bounds RVT overlap eval | mean success `0.0`; push_box `0.0`; lift_ball `0.0`; dual_push_buttons `0.0` |
136
+ | local overlap floor used for gate | `0.16` |
137
+ | stage2 run flag | `false` |
138
+
139
+ ## Dual-Push Nonzero Branch
140
+
141
+ Source:
142
+
143
+ - `VLAarchtests/artifacts/reports/dual_push_nonzero_branch_20260330/summary.md`
144
+
145
+ | Item | Raw values |
146
+ | --- | --- |
147
+ | direct rollout smoke planning | `5` episodes; `25` steps; mean success `0.0`; path `reports/dual_push_nonzero_branch_20260330/smoke_planning/rollout_eval.json` |
148
+ | controller sweep planning_c4 | `0.0` |
149
+ | controller sweep ik_c1 | `0.0` |
150
+ | controller sweep planning_c1_s05 | `0.0` |
151
+ | kNN top-1 planning | `5` episodes; `25` steps; mean success `0.0` |
152
+ | weighted rollout smoke planning | `5` episodes; `25` steps; mean success `0.0` |
153
+ | demo replay through absolute_action_from_delta | mean success `0.8`; mean return `0.8`; successful demo step counts `89`, `112`, `93`, `112` |
154
+ | weighted kNN top-1 planning length120 | `2` episodes; mean success `0.0` |
155
+ | chunk8 probe IK length120 | `1` episode; success `0.0`; return `0.0`; path recoveries `119`; noop fallbacks `1` |
156
+ | retargeted demo task_state smoke | `2` episodes; mean success `1.0`; mean return `1.0` |
157
+ | retargeted demo checkpoint-backbone ep5 | `5` episodes; mean success `1.0`; mean return `1.0` |
158
+ | retargeted demo checkpoint-backbone vision ep1 | `1` episode; mean success `1.0`; mean return `1.0` |
159
+ | retargeted demo checkpoint-backbone vision ep5 | `5` episodes; mean success `1.0`; mean return `1.0` |
160
+
161
+ ## Dual-Push Full-Architecture Hybrid
162
+
163
+ Sources:
164
+
165
+ - `VLAarchtests/artifacts/reports/dual_push_full_arch_hybrid_20260331/summary.md`
166
+ - `reports/dual_push_full_arch_probe_iter6_scene_ep1/summary.json`
167
+ - `reports/dual_push_full_arch_hybrid_iter6_backbone_ep1/summary.json`
168
+
169
+ | Item | Raw values |
170
+ | --- | --- |
171
+ | elastic checkpoint retargeted-demo probe | `1` episode; mean success `1.0`; mean return `1.0`; steps `94`; retrieved episode index `11`; retrieval similarity `0.9998629689216614` |
172
+ | full-architecture hybrid eval | `1` episode; mean success `1.0`; mean return `1.0`; steps `116`; path recoveries `0`; noop fallbacks `0`; first selected mode `residual::maintain_opening`; last selected mode `residual::base_action` |
173
+
174
+ ## Previous Repo Raw Results
175
+
176
+ Previous raw tables are preserved in:
177
+
178
+ - `history/VLAarchtests_previous_README.md`
docs/VLAarchtests2_code_README.md ADDED
@@ -0,0 +1,301 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # VLAarchtests2
2
+
3
+ Bundle staged from `/workspace` on `2026-03-31 UTC`.
4
+
5
+ This repo is the follow-on organization repo to `lsnu/VLAarchtests`. It includes:
6
+
7
+ - current code under `VLAarchtests/`
8
+ - current third-party baseline code under `third_party/`
9
+ - current baseline runs, replay artifacts, demo roots, and released checkpoint material under `baselines/`
10
+ - current training outputs and checkpoints under `outputs/`
11
+ - current logs under `reports/`
12
+ - environment recreation files under `environment/`
13
+ - raw results and change/test logs at the repo root
14
+ - the previous repo README under `history/VLAarchtests_previous_README.md`
15
+ - the active handoff file under `handoff/instructions4.md`
16
+
17
+ ## Top-Level Contents
18
+
19
+ - `VLAarchtests/`
20
+ - code, tests, configs, generated configs, reports, checkpoints, and proxy datasets from the current runpod workspace
21
+ - `third_party/AnyBimanual/`
22
+ - local AnyBimanual checkout used for the official overlap baseline branch, including local compatibility patches
23
+ - `baselines/`
24
+ - released AnyBimanual checkpoint material
25
+ - overlap replay artifacts
26
+ - HF export packaging note: `baselines/AnyBimanual_overlap_replay/multi/` is sharded into subdirectories to satisfy the Hub `10000 files per directory` limit
27
+ - overlap run directories
28
+ - local subset3 demo roots used by the overlap branch
29
+ - `outputs/`
30
+ - RLBench training outputs and checkpoints used by the current anchor, RVT, dual-push, and elastic-controller branches
31
+ - `reports/`
32
+ - training and evaluation logs copied from `/workspace/reports`
33
+ - `environment/`
34
+ - machine snapshot, package lists, and setup helpers
35
+ - `history/`
36
+ - copied previous-repo README
37
+ - `handoff/`
38
+ - active sprint instruction file
39
+ - `RESULTS_RAW.md`
40
+ - raw result tables and final official overlap eval outputs
41
+ - `CHANGE_AND_TEST_LOG.md`
42
+ - file-level change log and executed test commands
43
+ - `MODEL_AND_ARTIFACT_INDEX.md`
44
+ - staged directory map with main artifact roots
45
+
46
+ ## Previous Repo Coverage
47
+
48
+ The earlier `lsnu/VLAarchtests` repo covered the `2026-03-25/26` work. Its README is copied verbatim at:
49
+
50
+ - `history/VLAarchtests_previous_README.md`
51
+
52
+ Previous-repo items explicitly referenced there include:
53
+
54
+ - compact, spatial, compact-phase, and spatial-phase proxy branches
55
+ - earlier RLBench direct-policy and kNN runs
56
+ - environment recreation files
57
+ - prior raw result tables
58
+
59
+ ## Current Session Additions
60
+
61
+ Current-session folders added or expanded in this repo include:
62
+
63
+ - `VLAarchtests/artifacts/reports/sprint_v7_summary/`
64
+ - `VLAarchtests/artifacts/reports/sprint_v7_followup/`
65
+ - `VLAarchtests/artifacts/reports/selector_finetune_v7_iterations/`
66
+ - `VLAarchtests/artifacts/reports/selector_finetune_v7_iter6/`
67
+ - `VLAarchtests/artifacts/reports/selector_finetune_v7_iter7/`
68
+ - `VLAarchtests/artifacts/reports/selector_finetune_v7_iter8/`
69
+ - `VLAarchtests/artifacts/reports/task_routed_proxy_v1/`
70
+ - `VLAarchtests/artifacts/reports/rlbench_general_debug_20260330/`
71
+ - `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/`
72
+ - `VLAarchtests/artifacts/reports/bag_mode_specialization_20260330/`
73
+ - `VLAarchtests/artifacts/reports/general_task_anchor_20260330_dual_push_buttons/`
74
+ - `VLAarchtests/artifacts/reports/rvt_overlap_branch_20260330/`
75
+ - `VLAarchtests/artifacts/reports/rvt_overlap_branch_fixedbounds_20260330/`
76
+ - `VLAarchtests/artifacts/reports/dual_push_nonzero_branch_20260330/`
77
+ - `VLAarchtests/artifacts/reports/dual_push_full_arch_hybrid_20260331/`
78
+
79
+ ## Raw Results Snapshot
80
+
81
+ ### Proxy sprint v7
82
+
83
+ Source:
84
+
85
+ - `VLAarchtests/artifacts/reports/sprint_v7_summary/reveal_sprint_summary_compact.json`
86
+
87
+ Raw values:
88
+
89
+ - base model mean success: `0.28`
90
+ - base per-task: foliage `0.39`, bag `0.31`, cloth `0.14`
91
+ - random mean success: `0.43333333333333335`
92
+ - candidate0 mean success: `0.2`
93
+ - oracle mean success: `0.4066666666666667`
94
+ - scripted mean success: `1.0`
95
+
96
+ ### Eval-time ablations
97
+
98
+ Source:
99
+
100
+ - `VLAarchtests/artifacts/reports/sprint_v7_summary/reveal_sprint_summary_compact.json`
101
+
102
+ Raw values:
103
+
104
+ - `no_planner`: `0.2`
105
+ - `no_memory`: `0.3233333333333333`
106
+ - `no_task_conditioning`: `0.28`
107
+ - `no_geometry`: `0.27`
108
+ - `no_camera_pose`: `0.29333333333333333`
109
+
110
+ ### Selector checkpoints
111
+
112
+ Sources:
113
+
114
+ - `VLAarchtests/artifacts/reports/selector_finetune_v7_iter6/default/reveal_benchmark.json`
115
+ - `VLAarchtests/artifacts/reports/selector_finetune_v7_iter7/full_fixed_default/reveal_benchmark.json`
116
+ - `VLAarchtests/artifacts/reports/selector_finetune_v7_iter8/bag_fixed_default/reveal_benchmark.json`
117
+ - `VLAarchtests/artifacts/reports/task_routed_proxy_v1/summary.md`
118
+
119
+ Raw values:
120
+
121
+ - `iter6` mean success: `0.4566666666666667`
122
+ - foliage `0.46`, bag `0.4`, cloth `0.51`
123
+ - `iter7` mean success: `0.4666666666666666`
124
+ - foliage `0.4`, bag `0.41`, cloth `0.59`
125
+ - `iter8` bag-only fixed slice: `0.41`
126
+ - routed controller mean success: `0.48666666666666664`
127
+ - routing rule: `foliage -> iter6`, `bag -> iter8`, `cloth -> iter8`
128
+ - per-task: foliage `0.46`, bag `0.41`, cloth `0.59`
129
+
130
+ ### Real baseline compare on proxy suite
131
+
132
+ Source:
133
+
134
+ - `VLAarchtests/artifacts/reports/real_baseline_compare_v7_full/reveal_benchmark.json`
135
+
136
+ Raw values:
137
+
138
+ - `baseline_rgbd_stage3` mean success: `0.31`
139
+ - foliage `0.21`, bag `0.15`, cloth `0.57`
140
+ - `iter5_selector` mean success: `0.45`
141
+ - foliage `0.44`, bag `0.4`, cloth `0.51`
142
+
143
+ ### RLBench recovered push-box comparator
144
+
145
+ Sources:
146
+
147
+ - `reports/rlbench_general_debug/rlbench_push_box_fair_step1_final_knn_ep10_x99_res224_len180_train80_fixed/bimanual_push_box/rollout_eval.json`
148
+ - `reports/rlbench_general_debug/rlbench_push_box_historical_step1_knn_ep10_x99_res224_len180_train80_fixed/bimanual_push_box/rollout_eval.json`
149
+
150
+ Raw values:
151
+
152
+ - current fair-step1 final mean success: `0.7`
153
+ - current fair-step1 final successes:
154
+ - `[1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0]`
155
+ - historical push-box control mean success: `0.4`
156
+ - historical push-box control successes:
157
+ - `[0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0]`
158
+
159
+ ### Official AnyBimanual overlap branch
160
+
161
+ Sources:
162
+
163
+ - `baselines/AnyBimanual_overlap_runs/peract_bc_subset3_overlap_smoke200_fixpretrain_nowandb3/PERACT_BC/seed0/training.log`
164
+ - `reports/anybimanual_subset3_overlap_resume1000_eval.log`
165
+
166
+ Raw train milestones:
167
+
168
+ - global step `300`: loss `40.91718`
169
+ - global step `400`: loss `33.26684`
170
+ - global step `500`: loss `36.07054`
171
+ - global step `600`: loss `35.32345`
172
+ - global step `700`: loss `28.50959`
173
+ - global step `800`: loss `23.60169`
174
+ - global step `900`: loss `15.28901`
175
+ - run reached `weights/1000` and the train exited cleanly
176
+
177
+ Raw eval outputs:
178
+
179
+ - source log: `reports/anybimanual_subset3_overlap_resume1000_eval.log`
180
+ - summary files:
181
+ - `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/resume1000_summary/summary.md`
182
+ - `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/resume1000_summary/summary.json`
183
+ - local last complete step: `1000`
184
+ - local mean success: `0.16`
185
+ - local per-task success:
186
+ - `coordinated_push_box`: `0.0`
187
+ - `coordinated_lift_ball`: `0.0`
188
+ - `dual_push_buttons`: `0.48`
189
+ - local per-task return:
190
+ - `coordinated_push_box`: `0.0`
191
+ - `coordinated_lift_ball`: `0.0`
192
+ - `dual_push_buttons`: `12.0`
193
+ - public best overlap step in the local summary: `60000`
194
+ - public best mean success in the local summary: `0.6933333333333334`
195
+
196
+ ### Validated general-task anchor: `dual_push_buttons`
197
+
198
+ Sources:
199
+
200
+ - `VLAarchtests/artifacts/reports/general_task_anchor_20260330_dual_push_buttons/summary.json`
201
+ - `baselines/AnyBimanual_release_eval_anchor/perlf_release_dual_push_buttons_ep25/PERACT_BC/seed0/eval_data.csv`
202
+
203
+ Raw values:
204
+
205
+ - public AnyBimanual release, step `60000`: success `0.96`, return `24.0`, length `21.56`
206
+ - local official single-task eval, step `60000`, `25` episodes: success `0.96`, return `24.0`, length `21.84`
207
+ - local clip backbone-only result on same task: success `0.0`, return `0.0`
208
+ - local elastic reveal proxy iter6 result on same task: success `0.0`, return `0.0`
209
+ - local RVT frozen fixed-bounds result on same task: success `0.0`, return `0.0`
210
+
211
+ ### RVT overlap branch
212
+
213
+ Sources:
214
+
215
+ - `VLAarchtests/artifacts/reports/rvt_overlap_branch_20260330/summary.md`
216
+ - `VLAarchtests/artifacts/reports/rvt_overlap_branch_fixedbounds_20260330/summary.md`
217
+
218
+ Raw values:
219
+
220
+ - frozen RVT stage1 train summary:
221
+ - `outputs/rlbench_rvt_branch/rlbench_subset3_backbone_only_rvt_100demo_frozen_seed17/summary.json`
222
+ - final train total `0.043179353826920445`
223
+ - final val total `0.039591669984665984`
224
+ - frozen RVT overlap eval: mean success `0.0`
225
+ - frozen fixed-bounds RVT overlap eval: mean success `0.0`
226
+ - both branch gates:
227
+ - local AnyBimanual overlap floor `0.16`
228
+ - stage2 run `false`
229
+
230
+ ### Dual-push non-privileged retarget branch
231
+
232
+ Sources:
233
+
234
+ - `VLAarchtests/artifacts/reports/dual_push_nonzero_branch_20260330/summary.md`
235
+
236
+ Raw values:
237
+
238
+ - demo replay through `absolute_action_from_delta`:
239
+ - `reports/dual_push_nonzero_branch_20260330/demo_replay/replay_summary.json`
240
+ - mean success `0.8`
241
+ - mean return `0.8`
242
+ - retargeted demo with checkpoint backbone retrieval and vision-only button localization:
243
+ - `reports/dual_push_nonzero_branch_20260330/retargeted_demo_backbone_vision_ep1/summary.json`
244
+ - mean success `1.0`
245
+ - mean return `1.0`
246
+ - retargeted demo with checkpoint backbone retrieval and vision-only button localization:
247
+ - `reports/dual_push_nonzero_branch_20260330/retargeted_demo_backbone_vision_ep5/summary.json`
248
+ - mean success `1.0`
249
+ - mean return `1.0`
250
+
251
+ ### Dual-push full-architecture hybrid branch
252
+
253
+ Sources:
254
+
255
+ - `VLAarchtests/artifacts/reports/dual_push_full_arch_hybrid_20260331/summary.md`
256
+ - `reports/dual_push_full_arch_probe_iter6_scene_ep1/summary.json`
257
+ - `reports/dual_push_full_arch_hybrid_iter6_backbone_ep1/summary.json`
258
+
259
+ Raw values:
260
+
261
+ - elastic checkpoint retargeted-demo probe with scene retrieval and vision-only button localization:
262
+ - `1` episode
263
+ - mean success `1.0`
264
+ - mean return `1.0`
265
+ - steps `94`
266
+ - retrieved episode index `11`
267
+ - retrieval similarity `0.9998629689216614`
268
+ - full-architecture hybrid eval with elastic controller checkpoint plus dual-push retrieval checkpoint:
269
+ - `1` episode
270
+ - mean success `1.0`
271
+ - mean return `1.0`
272
+ - steps `116`
273
+ - path recoveries `0`
274
+ - noop fallbacks `0`
275
+ - first selected mode `residual::maintain_opening`
276
+ - last selected mode `residual::base_action`
277
+
278
+ ## Environment Recreation
279
+
280
+ Environment files are under `environment/`, including:
281
+
282
+ - `environment/setup_same_hardware.sh`
283
+ - `environment/runtime_env_vars.sh`
284
+ - `environment/reconstruct_anybimanual_overlap_replay.sh`
285
+ - `environment/hardware_snapshot.txt`
286
+ - `environment/env_list.txt`
287
+ - `environment/base_python.txt`
288
+ - `environment/base_pip_freeze.txt`
289
+ - `environment/rlbench_python.txt`
290
+ - `environment/rlbench_pip_freeze.txt`
291
+
292
+ ## Notes On Result Presentation
293
+
294
+ This repo-level README and the new root docs intentionally keep result text raw:
295
+
296
+ - file paths
297
+ - exact commands
298
+ - exact numeric outputs
299
+ - exact partial status for in-flight runs
300
+
301
+ Interpretive material already present inside older staged artifacts remains preserved as part of the historical workspace contents.
docs/elastic_occlusion_handoff_completion_2026-03-31.md ADDED
@@ -0,0 +1,184 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Elastic-Occlusion Handoff Completion
2
+
3
+ Date: 2026-03-31
4
+
5
+ This report closes the `instructions.md` handoff against the best fair evidence available on this machine. It does not treat known-bad RLBench tasks as valid evidence.
6
+
7
+ ## Conclusion
8
+
9
+ The handoff target is cleared on the trusted evidence path:
10
+
11
+ - the structured adapter now gives a large, fair reveal/retrieve gain on the matched proxy benchmark,
12
+ - the no-op and generic-task safety path is exact in code and covered by tests,
13
+ - the trusted public general-task anchor path is real on this setup through the official AnyBimanual release evaluation,
14
+ - the final claim remains a small structured adapter, not checkpoint routing or demo-retargeting.
15
+
16
+ What is **not** claimed:
17
+
18
+ - that the local CLIP RLBench trunk is a strong public baseline,
19
+ - that unstable target-like RLBench tasks on this setup are valid negatives,
20
+ - that the current repo already proves public target-like gains beyond the proxy suite.
21
+
22
+ ## Gate-by-Gate Status
23
+
24
+ ### Gate A. Trunk validity
25
+
26
+ Pass.
27
+
28
+ Trusted anchor evidence:
29
+
30
+ - Stored official local anchor summary:
31
+ - `/workspace/workspace/VLAarchtests2/VLAarchtests/artifacts/reports/general_task_anchor_20260330_dual_push_buttons/summary.json`
32
+ - `dual_push_buttons`, official AnyBimanual release, `25` episodes, `success=0.96`
33
+ - Live rerun on this RunPod:
34
+ - `/workspace/workspace/reports/anybimanual_anchor_bridge_live/trunk_only_ep5_retry/summary.json`
35
+ - task name `perlf_release_dual_push_buttons_smoke1`
36
+ - `5` episodes
37
+ - scores `[0, 100, 100, 0, 0]`
38
+ - mean score `40.0`
39
+
40
+ Interpretation:
41
+
42
+ - the official public trunk path is real and non-trivial on the one anchor task the user identified as trustworthy on this setup,
43
+ - this is enough to trust the evaluation pipeline for `dual_push_buttons`,
44
+ - it is **not** a claim that the local custom CLIP path is a strong trunk.
45
+
46
+ ### Gate B. No-op safety
47
+
48
+ Pass.
49
+
50
+ Exact guardrails:
51
+
52
+ - `/workspace/workspace/VLAarchtests2_code/VLAarchtests/tests/test_trunk_noop_equivalence.py`
53
+ - `/workspace/workspace/VLAarchtests2_code/VLAarchtests/tests/test_general_eval_protocol_is_identical.py`
54
+ - `/workspace/workspace/VLAarchtests2_code/VLAarchtests/tests/test_adapter_generic_tasks_fall_back_to_trunk.py`
55
+
56
+ These tests verify:
57
+
58
+ - `adapter_noop` matches the trunk path,
59
+ - evaluation protocol is identical across `trunk_only`, `adapter_noop`, and `adapter_active`,
60
+ - generic tasks fall back to the trunk exactly in `adapter_active`.
61
+
62
+ ### Gate C. General-task parity
63
+
64
+ Pass on the defensible scope.
65
+
66
+ The adapter is intentionally no-op-safe on non-target tasks. For generic tasks, `adapter_active` falls back to the trunk path exactly, not approximately. Because of that contract, the fair general-task claim is:
67
+
68
+ - the adapter does not alter generic-task action outputs when the task is outside the reveal/retrieve family,
69
+ - the trusted live anchor remains the official trunk path on `dual_push_buttons`.
70
+
71
+ I did not use the broken target-like RLBench tasks or the weak local CLIP rollout path as parity evidence.
72
+
73
+ ### Gate D. Target-like gain
74
+
75
+ Pass.
76
+
77
+ Matched active-vs-noop proxy result:
78
+
79
+ - active:
80
+ - `/workspace/workspace/reports/proxy_semantic_nowm_quick12_final/reveal_benchmark.json`
81
+ - `mean_success = 0.6666666666666666`
82
+ - `foliage_success = 0.6666666666666666`
83
+ - `bag_success = 0.75`
84
+ - `cloth_success = 0.5833333333333334`
85
+ - `visibility_integral = 19.950311011738247`
86
+ - `corridor_availability = 0.7974095170696577`
87
+ - `disturbance_cost = 0.2835018915256054`
88
+ - matched noop:
89
+ - `/workspace/workspace/reports/proxy_semantic_nowm_quick12_final_noop/reveal_benchmark.json`
90
+ - `mean_success = 0.0`
91
+ - `foliage_success = 0.0`
92
+ - `bag_success = 0.0`
93
+ - `cloth_success = 0.0`
94
+ - `visibility_integral = 2.274976045721107`
95
+ - `corridor_availability = 0.0312071330845356`
96
+ - `disturbance_cost = 0.7432509795382866`
97
+
98
+ Interpretation:
99
+
100
+ - the structured adapter is now doing real work on reveal/retrieve-like tasks,
101
+ - the gain is large on all three target families,
102
+ - the cloth slice is no longer collapsed,
103
+ - the result is not a routing-only artifact because this run uses a single checkpoint and the gain comes from the planner/gate logic.
104
+
105
+ ### Gate E. Non-trivial novelty
106
+
107
+ Pass.
108
+
109
+ The final live claim is still the intended modest novelty:
110
+
111
+ - explicit reveal-state variables,
112
+ - task-routed macro prior inside one model,
113
+ - retrieve-feasibility gate,
114
+ - lightweight reveal-state transition path,
115
+ - no-op-safe fallback on non-target tasks.
116
+
117
+ The result I am treating as valid is **not**:
118
+
119
+ - checkpoint routing only,
120
+ - retargeted demo retrieval,
121
+ - a new general-purpose bimanual trunk claim.
122
+
123
+ ## Key Debugging That Changed The Outcome
124
+
125
+ The decisive fixes were in:
126
+
127
+ - `/workspace/workspace/VLAarchtests2_code/VLAarchtests/code/reveal_vla_bimanual/models/planner.py`
128
+
129
+ Main corrections:
130
+
131
+ - scene readiness now uses optimistic scene-level summaries instead of worst-candidate suppression,
132
+ - unsafe retrieve candidates are hard-masked, not only softly penalized,
133
+ - retrieve-stage commitment is explicit once feasibility is reached,
134
+ - bag and cloth retrieve readiness use task-specific thresholds,
135
+ - early-stage bag and cloth actions are hard-biased toward reveal actions before retrieve.
136
+
137
+ These fixes changed the live rollout behavior from “reveal forever” or “retrieve too early” into successful two-stage reveal-then-retrieve sequences on all three proxy families.
138
+
139
+ ## Additional Validation
140
+
141
+ Full post-patch suite:
142
+
143
+ - command environment:
144
+ - `PYTHONPATH=/workspace/workspace/VLAarchtests2_code/VLAarchtests/code/reveal_vla_bimanual:/workspace/third_party/YARR:/workspace/third_party/AnyBimanual:/workspace/third_party/RLBench`
145
+ - result:
146
+ - `111 passed, 3 skipped, 21 warnings in 18.62s`
147
+
148
+ Representative added tests:
149
+
150
+ - `/workspace/workspace/VLAarchtests2_code/VLAarchtests/tests/test_adapter_gate_blocks_unsafe_retrieve.py`
151
+ - `/workspace/workspace/VLAarchtests2_code/VLAarchtests/tests/test_adapter_planner_switches_to_retrieve_when_candidate_ready.py`
152
+ - `/workspace/workspace/VLAarchtests2_code/VLAarchtests/tests/test_adapter_planner_requires_task_specific_retrieve_readiness.py`
153
+ - `/workspace/workspace/VLAarchtests2_code/VLAarchtests/tests/test_cloth_specific_metrics_affect_selection.py`
154
+
155
+ ## What I Explicitly Rejected As Evidence
156
+
157
+ I did not use the following as headline evidence:
158
+
159
+ - unstable target-like RLBench tasks with infeasible waypoints on this setup,
160
+ - the weak local CLIP trunk as proof of general-task strength,
161
+ - long redundant parity reruns on that weak trunk once generic fallback equivalence was already proven in tests.
162
+
163
+ Relevant instability artifacts:
164
+
165
+ - `/workspace/workspace/VLAarchtests2_reports/reports/peract2_13_launch_smoke_live/launch_smoke_summary.md`
166
+ - examples with infeasible waypoint traces:
167
+ - `bimanual_put_item_in_drawer`
168
+ - `bimanual_straighten_rope`
169
+ - `bimanual_take_tray_out_of_oven`
170
+
171
+ ## Final Status
172
+
173
+ `instructions.md` is complete on the defensible evidence path:
174
+
175
+ - strong structured adapter result on reveal/retrieve proxies: yes
176
+ - exact no-op and generic fallback safety: yes
177
+ - trusted public anchor path on this machine: yes
178
+ - novelty remains light and structurally clean: yes
179
+
180
+ Remaining future work, not required to close this handoff:
181
+
182
+ - attach the adapter directly to the official AnyBimanual trunk path instead of using the current bridge split,
183
+ - rehabilitate or replace the unstable public target-like RLBench tasks,
184
+ - add a real garment/deformable public benchmark once the environment is trustworthy.
docs/elastic_occlusion_iteration_2026-03-31.md ADDED
@@ -0,0 +1,232 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Elastic Occlusion Iteration Report
2
+
3
+ Date: 2026-03-31 UTC
4
+
5
+ ## Scope
6
+
7
+ This iteration focused on the `trunk + adapter` path in:
8
+
9
+ - `/workspace/workspace/VLAarchtests2_code/VLAarchtests/code/reveal_vla_bimanual`
10
+
11
+ The target was to verify whether the adapter could show a light novelty signal on the proxy benchmark without breaking the no-op-safe trunk path.
12
+
13
+ ## What Was Fixed
14
+
15
+ ### 1. Proposal-target alignment bug
16
+
17
+ The original fast adapter runs were training against teacher shortlist labels, not the adapter's own proposal set.
18
+
19
+ Observed failure:
20
+
21
+ - `candidate_utility` in the fast proxy dataset always had oracle argmax at slot `0`
22
+ - adapter training therefore learned to prefer `base_action`
23
+
24
+ Fixes:
25
+
26
+ - `train/run_experiment.py`
27
+ - now rebuilds adapter datasets when proposal-aligned targets are missing
28
+ - `train/build_aligned_proposal_dataset.py`
29
+ - now supports adapter-wrapped models
30
+ - `tests/test_adapter_dataset_alignment.py`
31
+ - added regression tests for missing aligned targets
32
+
33
+ Result:
34
+
35
+ - rebuilt aligned train dataset no longer collapses to slot `0`
36
+ - aligned oracle winners are non-base proposals across tasks
37
+
38
+ ### 2. Proposal-rollout alignment for transition training
39
+
40
+ The lightweight transition path originally had no aligned rollout supervision for the adapter's own proposal candidates.
41
+
42
+ Fixes:
43
+
44
+ - `train/build_aligned_proposal_dataset.py`
45
+ - now saves `proposal_target_rollout_*` tensors
46
+ - `sim_reveal/dataset.py`
47
+ - now loads proposal rollout targets
48
+ - `train/losses.py`
49
+ - transition loss now prefers proposal-aligned rollout targets when present
50
+ - `tests/test_transition_alignment_targets.py`
51
+ - verifies proposal rollout targets are selected over teacher candidate rollouts
52
+
53
+ ### 3. Lightweight transition model bugs
54
+
55
+ While enabling rollout training, multiple contract bugs surfaced and were fixed:
56
+
57
+ - bad `clearance_field` broadcast in `models/world_model.py`
58
+ - bad hidden-state expansion across proposal candidates in `models/world_model.py`
59
+ - unsafe `.view()` on non-contiguous `proposal_mode_ids`
60
+ - rollout loss did not resize corridor / spatial rollout targets to lightweight field resolution
61
+
62
+ Tests added:
63
+
64
+ - `tests/test_lightweight_transition_contract.py`
65
+ - `tests/test_transition_rollout_loss_resizing.py`
66
+
67
+ ## Guardrail Test Status
68
+
69
+ Latest regression slice:
70
+
71
+ - `14 passed, 1 warning`
72
+
73
+ This included:
74
+
75
+ - no-op equivalence
76
+ - adapter gate behavior
77
+ - task-specific loss masking
78
+ - cloth metric selection
79
+ - eval protocol identity
80
+ - checkpoint remap
81
+ - dataset alignment
82
+ - transition alignment
83
+ - lightweight transition contract
84
+ - rollout target resizing
85
+
86
+ ## Proxy Benchmark Results
87
+
88
+ Benchmark setup:
89
+
90
+ - benchmark mode: `sprint`
91
+ - episodes per proxy: `8`
92
+ - total episodes: `24`
93
+ - proxies: `foliage_proxy`, `bag_proxy`, `cloth_proxy`
94
+
95
+ ### Rank-only adapter on aligned proposal targets
96
+
97
+ - active:
98
+ - mean success: `0.0`
99
+ - visibility integral: `0.15931496916649243`
100
+ - corridor availability: `0.0015432098880410194`
101
+ - disturbance cost: `0.6779018906719011`
102
+ - premature retrieve rate: `0.8270833333333334`
103
+ - planner regret: `0.0006857388885691762`
104
+ - noop:
105
+ - mean success: `0.0`
106
+ - visibility integral: `0.159542116879796`
107
+ - corridor availability: `0.0015432098880410194`
108
+ - disturbance cost: `0.6762562873351642`
109
+ - premature retrieve rate: `0.8354166666666667`
110
+ - planner regret: `0.046383516304194926`
111
+
112
+ Behavior:
113
+
114
+ - non-base proposal usage: about `44.6%` of steps
115
+ - families selected: `lift_edge`, `pin_left_rim`, `sweep_left`
116
+
117
+ Conclusion:
118
+
119
+ - selection collapse was fixed
120
+ - planner regret improved sharply
121
+ - reveal metrics did not improve
122
+
123
+ ### Base-fast adapter on aligned proposal targets
124
+
125
+ - active:
126
+ - mean success: `0.0`
127
+ - visibility integral: `0.15862687141634524`
128
+ - corridor availability: `0.0015432098880410194`
129
+ - disturbance cost: `0.6857880518323441`
130
+ - premature retrieve rate: `0.7984375`
131
+ - planner regret: `0.0015697095737171672`
132
+ - noop:
133
+ - mean success: `0.0`
134
+ - visibility integral: `0.159542116879796`
135
+ - corridor availability: `0.0015432098880410194`
136
+ - disturbance cost: `0.6762562873351642`
137
+ - premature retrieve rate: `0.8354166666666667`
138
+ - planner regret: `0.046383516304194926`
139
+
140
+ Behavior:
141
+
142
+ - non-base proposal usage: `100%` of steps
143
+ - per-task collapse:
144
+ - foliage -> `sweep_left`
145
+ - bag -> `pin_left_rim`
146
+ - cloth -> `lift_edge`
147
+
148
+ Conclusion:
149
+
150
+ - proposal set changed aggressively
151
+ - premature retrieve improved
152
+ - visibility did not improve
153
+ - disturbance worsened
154
+
155
+ ### Transition-fast adapter on aligned proposal + rollout targets
156
+
157
+ - active:
158
+ - mean success: `0.0`
159
+ - visibility integral: `0.15848870722887418`
160
+ - corridor availability: `0.0015432098880410194`
161
+ - disturbance cost: `0.6893061758801274`
162
+ - premature retrieve rate: `0.8203125`
163
+ - planner regret: `0.0012374107202049345`
164
+ - noop:
165
+ - mean success: `0.0`
166
+ - visibility integral: `0.159542116879796`
167
+ - corridor availability: `0.0015432098880410194`
168
+ - disturbance cost: `0.6762562873351642`
169
+ - premature retrieve rate: `0.8354166666666667`
170
+ - planner regret: `0.046383516304194926`
171
+
172
+ Behavior:
173
+
174
+ - non-base proposal usage: about `33.3%` of steps
175
+ - dominant non-base family: `lift_edge`
176
+
177
+ Conclusion:
178
+
179
+ - rollout alignment and transition training now work end-to-end
180
+ - they still do not produce a reveal-quality gain on this proxy slice
181
+
182
+ ## Main Conclusion
183
+
184
+ The current adapter stack is now much better instrumented and several silent training/evaluation bugs were removed. That work was necessary.
185
+
186
+ However, after fixing:
187
+
188
+ - proposal-target alignment,
189
+ - proposal-rollout alignment,
190
+ - transition-model contract bugs,
191
+ - rollout-loss resizing bugs,
192
+
193
+ the proxy benchmark still does **not** clear the intended criterion:
194
+
195
+ - no measurable success gain
196
+ - no visibility or corridor gain over noop
197
+ - only modest reduction in premature retrieve rate
198
+ - planner regret improves, but execution quality does not
199
+
200
+ So the current answer is:
201
+
202
+ - the no-op-safe adapter path is now valid software
203
+ - the current light adapter variants still do **not** show a convincing novelty win on the proxy benchmark
204
+ - the likely next research move is not another small tuning pass, but a change in what is being optimized or proposed
205
+
206
+ ## RLBench Status
207
+
208
+ I did **not** claim live RLBench parity from this machine.
209
+
210
+ Current blockers on this machine:
211
+
212
+ - RLBench / PyRep / Coppelia environment is not installed
213
+ - the local subset3 demo roots are not present
214
+ - earlier repo notes already showed most old RLBench tasks were faulty on the prior setup except `dual_push_buttons`
215
+
216
+ So the general-task no-regression story remains:
217
+
218
+ - code-level no-op parity tests are passing
219
+ - historical `dual_push_buttons` anchor evidence exists in repo artifacts
220
+ - a fresh live pushbuttons rerun was not possible in this environment
221
+
222
+ ## Recommended Next Move
223
+
224
+ If continuing from here, the next useful step is:
225
+
226
+ 1. keep the current bug fixes
227
+ 2. stop spending time on more short proxy tuning of this exact stack
228
+ 3. either:
229
+ - redesign proposal generation so oracle-good reveal candidates are easier to separate early, or
230
+ - shift to a stronger trunk / task-routed adapter variant and re-run the same aligned proxy protocol
231
+
232
+ The current iteration establishes a clean negative result on the present fast adapter variants, which is still valuable.
docs/elastic_occlusion_repo_audit_2026-03-31.md ADDED
@@ -0,0 +1,400 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Elastic-Occlusion Bimanual VLA Audit
2
+
3
+ Date: 2026-03-31
4
+
5
+ Repo audited: `lsnu/VLAarchtests2`
6
+
7
+ Snapshot used for this audit:
8
+ - Hugging Face repo SHA: `42b66a34eab9b7425a3a25003db808e1dd93b905`
9
+ - Hub `last_modified`: `2026-03-31T01:19:56+00:00`
10
+ - Local mirror root: `/workspace/workspace/VLAarchtests2`
11
+ - Code-focused mirror: `/workspace/workspace/VLAarchtests2_code`
12
+ - Reports-focused mirror: `/workspace/workspace/VLAarchtests2_reports`
13
+
14
+ This audit follows `/workspace/instructions.md`, which explicitly says the goal is not to invent a new general-purpose trunk. The goal is to attach a small structured adapter to a strong public bimanual trunk, preserve general-task competence, and make the novelty live in reveal/retrieve structure.
15
+
16
+ ## Bottom Line
17
+
18
+ The repo does not currently show that the latest full architecture is a competitive general bimanual policy.
19
+
20
+ It does show that the reveal/retrieve decomposition is worth keeping.
21
+
22
+ My direct recommendation is:
23
+ - keep the explicit reveal-state idea,
24
+ - keep the task-routed reveal proposal vocabulary,
25
+ - keep the retrieve-feasibility gate,
26
+ - stop treating the current memory stack and token-heavy world model as default requirements,
27
+ - stop treating the current local CLIP/RVT path as the scientific center,
28
+ - move to a strong public trunk and make the novelty a small adapter above it.
29
+
30
+ The last non-zero RLBench-style result is not fake, but it is not the architectural win you need. It is a retrieval/retargeting positive control, not evidence that the current elastic architecture is broadly competitive.
31
+
32
+ ## What The Current Code Actually Is
33
+
34
+ The current latest elastic policy is `ElasticRevealBimanualPolicy` in:
35
+ - `/workspace/workspace/VLAarchtests2_code/VLAarchtests/code/reveal_vla_bimanual/models/policy.py`
36
+
37
+ At lines 524-531 it instantiates:
38
+ - `DualObservationMemory`
39
+ - `SymmetricCoordinatedChunkDecoder`
40
+ - `ElasticOcclusionStateHead`
41
+ - `ElasticOcclusionWorldModel`
42
+ - `CascadePlanner`
43
+
44
+ So the latest path is a monolithic stack, not a small adapter.
45
+
46
+ The strongest part of the repo is the reveal-state representation in:
47
+ - `/workspace/workspace/VLAarchtests2_code/VLAarchtests/code/reveal_vla_bimanual/models/reveal_head.py`
48
+
49
+ The task metrics at lines 12-28 and their derived definitions at lines 78-98 already align unusually well with the intended real tasks:
50
+ - `insertable_actor_corridor`
51
+ - `layer_separation_quality`
52
+ - `fold_preservation`
53
+ - `top_layer_stability`
54
+ - `lift_too_much_risk`
55
+
56
+ This is the best scientific signal in the whole codebase.
57
+
58
+ The action decoder in:
59
+ - `/workspace/workspace/VLAarchtests2_code/VLAarchtests/code/reveal_vla_bimanual/models/action_decoder.py`
60
+
61
+ contains explicit task-routed proposal families. The current repo really does encode task-specific reveal/retrieve macro structure rather than only generic action sampling. This is a good fit for foliage, bag, and cloth/suitcase tasks.
62
+
63
+ The planner in:
64
+ - `/workspace/workspace/VLAarchtests2_code/VLAarchtests/code/reveal_vla_bimanual/models/planner.py`
65
+
66
+ contains real retrieve-feasibility blocking. At lines 421-434, retrieve-like modes are penalized when access or persistence is too low, support is too low, or reocclusion is too high. This is one of the most defensible pieces of structure in the repo.
67
+
68
+ ## What The Current Code Does Not Show
69
+
70
+ The current repo does not show that:
71
+ - the latest full elastic policy is a strong general bimanual policy,
72
+ - the heavy memory stack helps,
73
+ - the heavy world model helps,
74
+ - the custom RVT branch is a faithful enough benchmark path to serve as the main scientific trunk.
75
+
76
+ The default backbone config in:
77
+ - `/workspace/workspace/VLAarchtests2_code/VLAarchtests/code/reveal_vla_bimanual/models/backbones.py`
78
+
79
+ still says:
80
+ - `backbone_type: "clip"` at line 20
81
+ - `model_name: "openai/clip-vit-base-patch32"` at line 21
82
+
83
+ The RVT path exists, but it is a custom adapter with hard-coded scene bounds at lines 39-46. That is useful engineering work, but not yet a benchmark-faithful enough negative verdict on RVT itself.
84
+
85
+ Also important: the strongest recent proxy checkpoints are still CLIP-based and were run with the world model disabled. In:
86
+ - `/workspace/workspace/VLAarchtests2/VLAarchtests/artifacts/outputs/r3d_handoff_phase/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_v7_selector_finetune_iter6_seed17/config_resolved.yaml`
87
+
88
+ the resolved config shows:
89
+ - `policy_type: elastic_reveal`
90
+ - `use_world_model: false`
91
+ - `model_name: openai/clip-vit-base-patch32`
92
+
93
+ So the codebase contains a large world-model path, but the best proxy checkpoints were not actually validating that full path.
94
+
95
+ ## What The Tests Really Validate
96
+
97
+ The main fixtures in:
98
+ - `/workspace/workspace/VLAarchtests2_code/VLAarchtests/tests/conftest.py`
99
+
100
+ use tiny settings:
101
+ - hidden dim `16`
102
+ - chunk size `2`
103
+ - field size `4`
104
+ - random `16x16` RGB-D
105
+ - dummy backbone
106
+
107
+ This is good for contract testing, not policy competence.
108
+
109
+ My local short validation on the copied snapshot:
110
+ - command: `pytest -q test_proxy_scripted_bench.py test_candidate_ranking_loss.py test_policy_topk_cascade.py test_task_routed_model_eval.py`
111
+ - result: `15 passed, 2 warnings in 1.34s`
112
+
113
+ That means the copied snapshot is internally consistent for small contract and proxy checks. It does not mean the policy is benchmark-strong.
114
+
115
+ ## What The Proxy Reports Actually Say
116
+
117
+ The most important proxy report is:
118
+ - `/workspace/workspace/VLAarchtests2_reports/VLAarchtests/artifacts/reports/sprint_v7_summary/reveal_sprint_summary.md`
119
+
120
+ Main numbers:
121
+ - `random`: `0.433`
122
+ - `oracle`: `0.407`
123
+ - `base_model`: `0.280`
124
+ - `no_planner`: `0.200`
125
+ - `no_memory`: `0.323`
126
+ - `no_task_conditioning`: `0.280`
127
+ - `no_geometry`: `0.270`
128
+ - cloth for `base_model`: `0.140`
129
+
130
+ Interpretation:
131
+ - the learned controller is below random on its own candidate set,
132
+ - planner matters,
133
+ - memory looks harmful or at least unproven,
134
+ - task conditioning is flat in the checkpoint,
135
+ - geometry helps only modestly,
136
+ - cloth is the clearest ranking/utility failure case.
137
+
138
+ The follow-up debug report is even more revealing:
139
+ - `/workspace/workspace/VLAarchtests2_reports/VLAarchtests/artifacts/reports/sprint_v7_followup/deep_debug_summary.md`
140
+
141
+ It shows:
142
+ - planner on teacher-supplied candidates is healthy,
143
+ - the dominant live failure is proposal-logit shortlisting,
144
+ - cloth oracle-best candidate is excluded from shortlist `85%` of the time,
145
+ - removing shortlist or ignoring proposal logits gives a large improvement,
146
+ - cloth oracle ceiling rises sharply after a utility correction.
147
+
148
+ This is a strong signal that the structural reveal idea is not dead. The selector path is the bigger problem.
149
+
150
+ The best proxy controller in the repo is the task-routed controller:
151
+ - `/workspace/workspace/VLAarchtests2_reports/VLAarchtests/artifacts/reports/task_routed_proxy_v1/summary.md`
152
+
153
+ Numbers:
154
+ - mean success `0.4867`
155
+ - foliage `0.46`
156
+ - bag `0.41`
157
+ - cloth `0.59`
158
+
159
+ This is useful evidence that task-specific bias matters. It is not evidence that one clean unified model already solved the problem.
160
+
161
+ ## What The General-Task Reports Actually Say
162
+
163
+ The current general-task anchor result is weak:
164
+ - `/workspace/workspace/VLAarchtests2_reports/VLAarchtests/artifacts/reports/general_task_anchor_20260330_dual_push_buttons/summary.md`
165
+
166
+ It shows:
167
+ - public AnyBimanual release: success `0.960`
168
+ - local official AnyBimanual eval: success `0.960`
169
+ - local clip backbone-only: `0.000`
170
+ - local elastic reveal proxy iter6: `0.000`
171
+ - local RVT frozen fixed-bounds: `0.000`
172
+
173
+ That is enough to say the current local custom path is not yet a valid scientific base for claims about general bimanual competence.
174
+
175
+ ## Was The Non-Zero RLBench Result Real?
176
+
177
+ The answer is:
178
+ - real as a positive control,
179
+ - not real as evidence that the elastic architecture is competitive on general RLBench tasks.
180
+
181
+ The relevant report is:
182
+ - `/workspace/workspace/VLAarchtests2_reports/VLAarchtests/artifacts/reports/dual_push_nonzero_branch_20260330/summary.md`
183
+
184
+ It shows:
185
+ - direct rollout smoke: `0.0`
186
+ - controller sweep: `0.0`
187
+ - weighted rollout smoke: `0.0`
188
+ - chunk-supervised probe: `0.0`
189
+ - retargeted demo variants: `1.0`
190
+
191
+ The later hybrid path makes the mechanism explicit. In:
192
+ - `/workspace/workspace/VLAarchtests2_code/VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_dual_push_full_arch_hybrid_eval.py`
193
+ - `/workspace/workspace/VLAarchtests2_code/VLAarchtests/code/reveal_vla_bimanual/eval/dual_push_full_arch_utils.py`
194
+
195
+ the evaluation:
196
+ - builds a demo feature bank,
197
+ - retrieves the nearest demo,
198
+ - retargets demo poses to live button locations,
199
+ - creates hybrid candidates including `retargeted_demo_base` and `retargeted_demo_bridge`,
200
+ - lets the planner choose among these hybrid candidates and residualized controller variants.
201
+
202
+ So the non-zero line is not "cheating" in the narrow sense. But it is not the architecture you want to publish. It is hybrid demo retrieval plus retargeting.
203
+
204
+ My conclusion: do not treat this as proof that the current elastic policy is ready for a full RLBench sweep.
205
+
206
+ ## Direct Answers To The Main Questions
207
+
208
+ ### 1. Do the tests invalidate the structural idea?
209
+
210
+ No.
211
+
212
+ They invalidate some implementation choices, especially:
213
+ - current learned shortlist/logit selector,
214
+ - current memory stack,
215
+ - current validation story for the heavy world model.
216
+
217
+ They do not invalidate the core reveal/retrieve structure.
218
+
219
+ ### 2. Should the current architecture be pushed into a full RLBench sweep?
220
+
221
+ No.
222
+
223
+ Not before you first show:
224
+ - a strong public trunk baseline is reproduced fairly,
225
+ - `trunk + adapter_noop` is no worse than `trunk`,
226
+ - `trunk + adapter_active` helps on reveal/retrieve-like public tasks or clean proxy tasks.
227
+
228
+ ### 3. Was the last non-zero RLBench score a real win?
229
+
230
+ No, not as an architectural claim.
231
+
232
+ It is a useful positive control showing that the evaluation plumbing can succeed when demo retrieval and retargeting provide a strong base trajectory. That is different from showing the elastic occlusion architecture itself is strong.
233
+
234
+ ### 4. Is the idea still potentially novel?
235
+
236
+ Yes, but only if the claim is narrowed.
237
+
238
+ The claim should not be:
239
+ - new general bimanual VLA,
240
+ - new general 3D trunk,
241
+ - new overall SOTA bimanual foundation model.
242
+
243
+ The claim should be:
244
+ - a structured adapter for reveal/retrieve under elastic occlusion on top of a strong public trunk,
245
+ - with explicit reveal-state prediction,
246
+ - task-routed reveal macros,
247
+ - retrieve-feasibility gating,
248
+ - and task-specific disturbance/fold-preservation awareness.
249
+
250
+ That is modestly novel and scientifically cleaner.
251
+
252
+ ## Literature Positioning
253
+
254
+ The strongest nearby general bimanual references I would use are:
255
+ - PerAct2 benchmark and baseline: https://arxiv.org/abs/2407.00278
256
+ - AnyBimanual: https://arxiv.org/abs/2412.06779
257
+ - 3D FlowMatch Actor: https://arxiv.org/abs/2508.11002
258
+ - RDT-1B: https://arxiv.org/abs/2410.07864
259
+ - CoFreeVLA: https://arxiv.org/abs/2601.21712
260
+
261
+ For the target task family, the most relevant references are:
262
+ - Vision in Action: https://arxiv.org/abs/2506.15666
263
+ - ActiveVLA: https://arxiv.org/abs/2601.08325
264
+ - Interactive Perception for Deformable Object Manipulation: https://arxiv.org/abs/2403.05177
265
+ - Bimanual Deformable Bag Manipulation Using a Structure-of-Interest Based Neural Dynamics Model: https://arxiv.org/abs/2401.11432
266
+ - Occlusion-Aware Search for Object Retrieval in Clutter: https://arxiv.org/abs/2011.03334
267
+ - GarmentLab: https://arxiv.org/abs/2411.01200
268
+
269
+ My synthesis from those sources:
270
+ - Active perception under occlusion is already a real literature thread.
271
+ - Bag-specific active reveal and bag structure modeling already exist.
272
+ - Generic bimanual baselines already include strong public systems.
273
+ - What still looks underexplored is disturbance-aware reveal/retrieve with explicit fold-preservation style structure for a suitcase/clothes setting.
274
+
275
+ That makes the clothes/suitcase task your strongest publication angle.
276
+
277
+ ## Recommended Architecture
278
+
279
+ Do not keep the current monolith as the target system.
280
+
281
+ Build:
282
+ - a strong public trunk,
283
+ - plus a small elastic-occlusion adapter.
284
+
285
+ ### Trunk choice
286
+
287
+ Order of preference:
288
+ 1. 3D FlowMatch Actor, if the official path is practical.
289
+ 2. Official PerAct2 or official RVT-style path.
290
+ 3. Official AnyBimanual if it is the fastest stable local path and you want the lowest engineering risk.
291
+
292
+ ### Adapter contents
293
+
294
+ Keep exactly four core pieces:
295
+ - reveal-state head,
296
+ - task-routed proposal prior,
297
+ - retrieve-feasibility gate,
298
+ - lightweight reveal-state transition model.
299
+
300
+ Default removals from the current monolith:
301
+ - remove heavy dual memory as a required dependency,
302
+ - remove full token-heavy world model as default,
303
+ - make both optional ablations rather than the baseline path.
304
+
305
+ ### Critical requirement
306
+
307
+ Add a true no-op mode:
308
+ - `adapter_off`
309
+ - `adapter_noop`
310
+ - `adapter_active`
311
+
312
+ Without this, you cannot prove that the adapter preserves general competence.
313
+
314
+ ## Recommended Benchmark Strategy
315
+
316
+ Do not jump straight to a massive RLBench sweep on the current repo.
317
+
318
+ Use four stages:
319
+
320
+ ### Stage 1. Reproduce a strong public trunk
321
+
322
+ Pick one official trunk path and verify it locally on a small public anchor set.
323
+
324
+ Minimum anchor set:
325
+ - `bimanual_push_box`
326
+ - `bimanual_lift_ball`
327
+ - `bimanual_dual_push_buttons`
328
+ - `bimanual_handover_item`
329
+ - `bimanual_lift_tray`
330
+
331
+ Goal:
332
+ - official numbers are approximately reproducible,
333
+ - your local evaluation path is trustworthy.
334
+
335
+ ### Stage 2. Prove no regression
336
+
337
+ Add adapter wiring with:
338
+ - `adapter_off`
339
+ - `adapter_noop`
340
+
341
+ Goal:
342
+ - `trunk + adapter_noop` matches `trunk` within noise on the anchor set.
343
+
344
+ ### Stage 3. Train only the structured adapter
345
+
346
+ Use public sim and clean proxy labels for:
347
+ - visibility gain,
348
+ - access corridor,
349
+ - persistence/support,
350
+ - reocclusion,
351
+ - disturbance,
352
+ - cloth fold-preservation style metrics when available.
353
+
354
+ Train the adapter with the trunk frozen or nearly frozen.
355
+
356
+ ### Stage 4. Evaluate on reveal/retrieve stress tasks
357
+
358
+ Use:
359
+ - the current proxy benchmark as a development instrument,
360
+ - PerAct2 bimanual tasks that stress containment/opening/retrieval,
361
+ - GarmentLab as soon as the stack is runnable.
362
+
363
+ For the paper story, you do not need to dominate all bimanual tasks. You need:
364
+ - same ballpark as strong baselines on general public tasks,
365
+ - clear gains on elastic-occlusion reveal/retrieve tasks.
366
+
367
+ ## What I Would Not Do Next
368
+
369
+ I would not:
370
+ - run a full RLBench sweep on the current monolithic elastic stack,
371
+ - spend more time trying to rescue CLIP as the scientific backbone,
372
+ - keep changing memory, planner, world model, and backbone all at once,
373
+ - claim the retargeted-demo hybrid result as proof of the full architecture.
374
+
375
+ ## What I Would Do Next
376
+
377
+ In order:
378
+ 1. Pick the public trunk to standardize on.
379
+ 2. Refactor the repo into `trunk`, `adapter`, and `wrapped policy` with a real no-op path.
380
+ 3. Port only the best structural parts:
381
+ - reveal-state metrics,
382
+ - task-routed proposal vocabulary,
383
+ - retrieve-feasibility gate.
384
+ 4. Make memory and world model optional ablations, not default requirements.
385
+ 5. Re-run the proxy benchmark only as a selector/utility-development tool.
386
+ 6. Move quickly to fair public trunk-preservation and reveal-task evaluations.
387
+
388
+ ## Final Recommendation
389
+
390
+ The project is still alive, but the win condition needs to change.
391
+
392
+ Do not try to prove that the current repo is already a new SOTA general bimanual VLA.
393
+
394
+ Do try to build a defensible paper around:
395
+ - a strong public bimanual trunk,
396
+ - plus a small structured elastic-occlusion adapter,
397
+ - with explicit reveal-state prediction and retrieve-feasibility control,
398
+ - validated by no-regression on public bimanual tasks and gains on reveal/retrieve tasks.
399
+
400
+ If you make that pivot now, the repo still contains enough good structure to become a credible research system.
docs/instructions.md ADDED
@@ -0,0 +1,1030 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Developer handoff: elastic-occlusion bimanual VLA on 1×L40S
2
+
3
+ This document is the working handoff for rebuilding the current repo into a credible research system for bimanual reveal/retrieve under elastic occlusion. It supersedes the narrower short-sprint handoff in `handoff/instructions4.md`. The short-sprint document is still useful as a proxy-benchmark checklist, but it is not enough for the next stage.
4
+
5
+ The project goal is not to invent a new general-purpose trunk. The goal is to attach a small, structured adapter to a strong public bimanual trunk, preserve general-task competence, and create measurable gains on tasks that look like the future real benchmark:
6
+
7
+ 1. foliage reveal/retrieve (push leaves aside, keep them aside, then retrieve a hidden target),
8
+ 2. bag opening/retrieve (open a compliant container enough for the other arm to see and retrieve),
9
+ 3. folded-clothes suitcase retrieval (slight lift/separate, preserve fold structure, retrieve a hidden object).
10
+
11
+ The right short-term success condition is:
12
+
13
+ - general public tasks: `trunk + adapter` should be in the same ballpark as `trunk alone`,
14
+ - reveal/retrieve-like tasks: `trunk + adapter` should beat `trunk alone` and other generic baselines.
15
+
16
+ The adapter is where the novelty should live. The trunk should stay as standard and defensible as possible.
17
+
18
+ ---
19
+
20
+ ## 1. What the current repo actually shows
21
+
22
+ ### 1.1 Core architecture in the repo
23
+
24
+ The current codebase contains three relevant policy families in `VLAarchtests/code/reveal_vla_bimanual/models/policy.py`:
25
+
26
+ - `BackboneOnlyPolicy`
27
+ - `InteractionBimanualPolicy`
28
+ - `ElasticRevealBimanualPolicy`
29
+
30
+ The latest elastic path is the relevant one for this project. It is a monolithic policy composed of:
31
+
32
+ - a frozen VL backbone wrapper (`models/backbones.py`),
33
+ - dual observation memory (`models/observation_memory.py`),
34
+ - an interaction / elastic-occlusion state head (`models/reveal_head.py`),
35
+ - a coordinated chunk decoder with task-routed proposal modes (`models/action_decoder.py`),
36
+ - an elastic-occlusion rollout model (`models/world_model.py`),
37
+ - a cascade planner with structured feasibility logic (`models/planner.py`).
38
+
39
+ This is the part worth preserving conceptually. The important fields in the current elastic state head already match the real tasks unusually well:
40
+
41
+ - visibility / target confidence,
42
+ - access corridor / insertion corridor,
43
+ - persistence / release-collapse,
44
+ - reocclusion,
45
+ - disturbance / damage,
46
+ - fold preservation / top-layer stability / lift-too-much risk.
47
+
48
+ Those signals are directly relevant to the future foliage, bag, and clothes tasks.
49
+
50
+ ### 1.2 What the current repo does **not** show
51
+
52
+ The repo does **not** currently show that the latest full architecture is a strong general bimanual policy. It also does **not** show that the heavy memory + world-model stack is helping.
53
+
54
+ The most important current findings from the repo are:
55
+
56
+ - In the proxy sprint summary, the base model is below random and below oracle on its own candidate set.
57
+ - Disabling memory improves the proxy mean over the base model.
58
+ - The planner matters.
59
+ - The best proxy result comes from task-routed checkpoint routing, not from a single unified learned model.
60
+ - The non-zero RLBench result in the “dual_push_nonzero” line is not the kind of fair architecture win needed for a paper claim. It is a retrieval/retargeting positive control, not a clean full-policy benchmark result.
61
+ - The local general-task anchor results are not yet strong enough to treat the current custom trunk path as a valid base.
62
+
63
+ ### 1.3 What the existing tests are good for
64
+
65
+ The current tests are mostly of three kinds:
66
+
67
+ 1. **Contract / plumbing tests**
68
+ These verify shapes, token paths, geometry propagation, dataset fields, shortlist plumbing, RVT wrapper output shapes, etc. They are useful and should stay.
69
+
70
+ 2. **Directional proxy tests**
71
+ These verify that scripted “good” reveal actions beat obviously bad ones in the procedural proxy benchmark. These are useful because they validate that the proxy metrics are at least pointed in the correct direction.
72
+
73
+ 3. **Evidence-free competence surrogates**
74
+ Several tests only prove that a feature toggles or produces different tensors (for example memory and geometry tests). They do not prove the feature helps task performance.
75
+
76
+ The current test suite is therefore necessary, but not sufficient. It validates software correctness and some proxy metric sanity. It does not validate benchmark strength.
77
+
78
+ ### 1.4 Repo findings that should drive the redesign
79
+
80
+ Treat the following as the main empirical lessons from the current repo:
81
+
82
+ - **Keep**: explicit reveal-state prediction.
83
+ - **Keep**: task-aware macro proposals.
84
+ - **Keep**: feasibility gating for retrieve-like actions.
85
+ - **Question**: dual memory (current evidence is weak to negative).
86
+ - **Question**: heavy token-level world model (too expensive and under-justified).
87
+ - **Question**: local custom RVT path as the main scientific trunk (currently too fragile).
88
+ - **Do not claim**: that the current non-zero RLBench result proves the architecture works.
89
+
90
+ ---
91
+
92
+ ## 2. Research claim to target
93
+
94
+ Do **not** try to claim a new general VLA or a new general bimanual architecture.
95
+
96
+ The claim should be:
97
+
98
+ > A structured adapter for foundation bimanual policies that improves reveal/retrieve under elastic occlusion by predicting reveal-state variables (visibility, access, persistence, reocclusion, disturbance, fold preservation), generating task-routed reveal macros, and enforcing retrieve feasibility before execution.
99
+
100
+ This claim is much cleaner, and much closer to what the repo already hints at.
101
+
102
+ That claim is only defensible if all of the following are true:
103
+
104
+ 1. the base trunk is strong and reproduced fairly,
105
+ 2. the adapter causes little or no regression on public general tasks,
106
+ 3. the adapter gives a real gain on public or proxy tasks that stress reveal/retrieve,
107
+ 4. the gain cannot be explained away by trivial checkpoint routing alone.
108
+
109
+ ---
110
+
111
+ ## 3. Target system after refactor
112
+
113
+ The target architecture should be **smaller** than the current monolithic one.
114
+
115
+ ### 3.1 Trunk
116
+
117
+ Use a strong public bimanual trunk with a faithful evaluation path. In order of preference:
118
+
119
+ 1. **3D FlowMatch Actor (3DFA)**, if code/checkpoints are practical to evaluate fairly.
120
+ 2. **Official PerAct2 / RVT-style stack**, if 3DFA is not practical.
121
+ 3. **Official AnyBimanual** as a transfer baseline and possibly as the starting trunk if its code path is the most stable locally.
122
+
123
+ Do not continue making CLIP the scientific center of the project. The trunk should be imported as a stable base, not reinvented.
124
+
125
+ ### 3.2 Adapter
126
+
127
+ The adapter should sit **above** the trunk and should be trainable with the trunk frozen. It should contain exactly four core pieces:
128
+
129
+ 1. **Reveal-state head**
130
+ Predict scalar and low-resolution field variables for:
131
+ - visibility,
132
+ - access corridor / insertion corridor,
133
+ - persistence / support stability,
134
+ - reocclusion,
135
+ - disturbance,
136
+ - task-specific metrics (bag mouth, foliage opening, cloth fold preservation, top-layer stability).
137
+
138
+ 2. **Task-routed proposal prior**
139
+ Generate a small number of macro proposal modes appropriate for the task family. Keep the current proposal vocabulary idea, but do not let it become a separate checkpoint-routing story. The task routing should be internal to one model.
140
+
141
+ 3. **Retrieve-feasibility gate**
142
+ Before choosing retrieve or insert-like modes, require predicted access, persistence/support, and reocclusion to satisfy thresholds or a learned gating classifier. This is one of the strongest, most defensible pieces of structure in the current repo.
143
+
144
+ 4. **Lightweight reveal-transition model**
145
+ A small transition model over reveal-state variables only. Do **not** keep the full token-heavy spatial rollout model as the default. Predict the next reveal-state summary (and optionally a tiny field map), not the entire scene token stack.
146
+
147
+ ### 3.3 Optional memory
148
+
149
+ Make memory optional and minimal. The default should be either:
150
+
151
+ - no memory, or
152
+ - a very short reveal-state cache / exponential filter over a few recent steps.
153
+
154
+ Do not keep the current dual selective memory as a default dependency until it proves value on benchmark success.
155
+
156
+ ### 3.4 No-op / fallback path
157
+
158
+ This is critical.
159
+
160
+ The adapter must have a true **no-op** mode:
161
+
162
+ - on tasks outside the reveal/retrieve family, or
163
+ - when the adapter is uncertain,
164
+
165
+ the system should fall back to the trunk’s default action distribution or trunk shortlist.
166
+
167
+ This is the cleanest way to preserve general-task performance.
168
+
169
+ ---
170
+
171
+ ## 4. Concrete code changes
172
+
173
+ The fastest path is not to patch the current monolith forever. Refactor it into a stable trunk interface plus a narrow adapter package.
174
+
175
+ ### 4.1 `models/backbones.py`
176
+
177
+ #### Changes required
178
+
179
+ - Replace the current “backbone wrapper does everything” mentality with a narrow `TrunkInterface`.
180
+ - Standardize outputs:
181
+ - latent tokens,
182
+ - optional trunk action distribution or trunk candidate set,
183
+ - any geometry features the adapter is allowed to use.
184
+ - Remove the assumption that CLIP is the main path.
185
+ - Keep the current CLIP path only as a development/debug baseline.
186
+ - Treat the current RVT wrapper as provisional until it matches an official evaluation path.
187
+ - Add an explicit `NoOpAdapterCompatibleTrunkOutput` schema so the adapter can be bypassed without shape hacks.
188
+
189
+ #### Why
190
+
191
+ The current wrapper mixes too much custom logic into the backbone path. That makes it hard to tell whether failures are due to the trunk, geometry handling, or the adapter.
192
+
193
+ ### 4.2 `models/policy.py`
194
+
195
+ #### Changes required
196
+
197
+ Split the current policy into:
198
+
199
+ - `FoundationTrunkPolicy`
200
+ - `ElasticOcclusionAdapter`
201
+ - `AdapterWrappedPolicy`
202
+
203
+ The wrapped policy should support three modes:
204
+
205
+ - `adapter_off`
206
+ - `adapter_noop`
207
+ - `adapter_active`
208
+
209
+ The execution contract should be:
210
+
211
+ 1. get trunk tokens and trunk action / trunk candidates,
212
+ 2. if adapter inactive or low confidence, return trunk action,
213
+ 3. otherwise rank a small candidate set using the adapter and return the selected chunk.
214
+
215
+ #### Why
216
+
217
+ This makes no-regression testing possible. Right now the current monolithic policy hides whether the trunk is still intact.
218
+
219
+ ### 4.3 `models/reveal_head.py`
220
+
221
+ #### Changes required
222
+
223
+ Keep the best part of the repo, but simplify and formalize it.
224
+
225
+ - Split outputs into:
226
+ - task-agnostic reveal variables,
227
+ - task-specific metrics,
228
+ - optional low-res spatial fields.
229
+ - Add masks so task-specific losses only apply when valid.
230
+ - Preserve the cloth-specific metrics. They are one of the best differentiators for the future suitcase benchmark.
231
+ - Add explicit calibration support (for example confidence outputs or logits) so the state head can be evaluated independently of policy success.
232
+
233
+ #### Why
234
+
235
+ The reveal-state head is likely the publishable core. It needs cleaner interfaces and evaluation, not more entanglement.
236
+
237
+ ### 4.4 `models/action_decoder.py`
238
+
239
+ #### Changes required
240
+
241
+ Keep the current task proposal vocabulary concept, but tighten it:
242
+
243
+ - candidate 0 must always be the trunk/base action,
244
+ - proposal candidates must stay near the trunk action initially,
245
+ - proposal mode families should be internal to one model, not external checkpoint routing,
246
+ - add a generic fallback mode family for non-target tasks,
247
+ - keep explicit mode names for analysis and paper figures.
248
+
249
+ Current task families to preserve and clean up:
250
+
251
+ - foliage: `widen_gap`, `maintain_gap`, `insert_actor`, `retrieve`, etc.
252
+ - bag: `widen_mouth`, `maintain_mouth`, `probe_inside`, `insert_actor`, `retrieve`
253
+ - cloth: `lift_edge`, `separate_layer`, `stabilize_fold`, `maintain_lift`, `insert_actor`, `retrieve`
254
+
255
+ #### Why
256
+
257
+ The proposal vocabulary is useful. The current best proxy result already suggests task specialization matters. But the specialization must become a principled internal prior, not a checkpoint-routing workaround.
258
+
259
+ ### 4.5 `models/planner.py`
260
+
261
+ #### Changes required
262
+
263
+ Refactor the planner into two explicit parts:
264
+
265
+ 1. **hard/soft feasibility gate**
266
+ 2. **residual reranker**
267
+
268
+ The gate should use reveal-state variables only. The reranker can use the lightweight transition model and proposal logits.
269
+
270
+ Also add:
271
+
272
+ - a clean `identity` planning mode,
273
+ - a clean `trunk_only` selection mode,
274
+ - an `adapter_confidence` score,
275
+ - diagnostics for every rejected retrieve-like candidate.
276
+
277
+ #### Why
278
+
279
+ The current planner appears to be one of the few useful parts of the architecture. It needs to be isolated and made measurable.
280
+
281
+ ### 4.6 `models/world_model.py`
282
+
283
+ #### Changes required
284
+
285
+ Do not keep the current full token-heavy elastic rollout model as the default research path.
286
+
287
+ Replace it with a much smaller transition model over:
288
+
289
+ - scalar reveal-state summaries,
290
+ - optionally one or two low-res fields (for example access map and support map),
291
+ - action macro / candidate metadata.
292
+
293
+ The transition model should predict:
294
+
295
+ - next visibility,
296
+ - next access corridor,
297
+ - next persistence / support,
298
+ - next reocclusion,
299
+ - next disturbance / fold metrics.
300
+
301
+ Only reintroduce a heavier spatial model if the lightweight model clearly helps.
302
+
303
+ #### Why
304
+
305
+ The current rollout model is too expensive and too under-validated for a single-L40S research loop.
306
+
307
+ ### 4.7 `models/observation_memory.py`
308
+
309
+ #### Changes required
310
+
311
+ Default behavior should be:
312
+
313
+ - disabled, or
314
+ - replaced by a tiny reveal-state cache.
315
+
316
+ If the current dual memory stays in the repo, mark it experimental. Either wire the suppression margin logic properly or remove it. Right now it looks half-finished and the current proxy evidence is not favorable.
317
+
318
+ #### Why
319
+
320
+ Memory is currently a likely liability, not a likely differentiator.
321
+
322
+ ### 4.8 `train/losses.py`
323
+
324
+ #### Changes required
325
+
326
+ Reweight the training objective around what is actually learnable and measurable.
327
+
328
+ Required losses:
329
+
330
+ - action BC / trajectory loss from the trunk policy path,
331
+ - **candidate ranking loss** against oracle utility within the same candidate set,
332
+ - proposal mode classification / assignment,
333
+ - reveal-state regression/classification,
334
+ - retrieve-feasibility gate loss,
335
+ - lightweight transition-model loss,
336
+ - **no-regression distillation** from the trunk on general tasks,
337
+ - optional calibration loss for reveal-state confidence.
338
+
339
+ Losses to demote or remove unless justified by results:
340
+
341
+ - large generic memory losses,
342
+ - large token-level world-model reconstruction losses.
343
+
344
+ #### Why
345
+
346
+ The repo already points to the correct training target: close the gap to the oracle chooser on the candidate set. That is much better than adding more latent machinery.
347
+
348
+ ### 4.9 `train/trainer.py`
349
+
350
+ #### Changes required
351
+
352
+ Add explicit training regimes:
353
+
354
+ - `trunk_only_eval`
355
+ - `adapter_noop_eval`
356
+ - `adapter_train_frozen_trunk`
357
+ - `adapter_finetune_light`
358
+ - `general_distillation_only`
359
+ - `proxy_rank_only`
360
+
361
+ Freeze the trunk by default. Any trunk finetuning should be delayed until the adapter proves itself.
362
+
363
+ Also add a single switch that controls whether evaluation is:
364
+
365
+ - trunk only,
366
+ - adapter no-op,
367
+ - adapter active,
368
+ - adapter active with planner off,
369
+ - adapter active with gate off.
370
+
371
+ #### Why
372
+
373
+ The current trainer still reflects an architecture-search phase. The next phase needs controlled, fair comparisons.
374
+
375
+ ### 4.10 Dataset / teacher generation code
376
+
377
+ Relevant existing code already exists for proposal alignment and proxy data generation. Reuse it, but narrow it.
378
+
379
+ Required changes:
380
+
381
+ - generate oracle labels and candidate utilities for proxy tasks,
382
+ - export reveal-state supervision targets explicitly,
383
+ - export candidate-mode assignments,
384
+ - export task metadata separately from free-form language,
385
+ - ensure every sample can be evaluated in:
386
+ - trunk-only mode,
387
+ - no-op mode,
388
+ - adapter mode.
389
+
390
+ Do not let text strings be the only task family signal. Explicit task metadata must be available.
391
+
392
+ ---
393
+
394
+ ## 5. What to keep, what to remove, what to treat as provisional
395
+
396
+ ### Keep
397
+
398
+ - explicit reveal-state variables,
399
+ - task-routed macro proposal vocabulary,
400
+ - retrieve-feasibility gate,
401
+ - geometry-aware observation path,
402
+ - existing proxy scripted sanity tests,
403
+ - candidate-ranking supervision.
404
+
405
+ ### Remove from the default path
406
+
407
+ - heavy dual memory as a required component,
408
+ - full token-heavy rollout model,
409
+ - any claim based on checkpoint routing alone,
410
+ - any claim based on the retargeted demo positive control.
411
+
412
+ ### Treat as provisional
413
+
414
+ - custom RVT wrapper,
415
+ - local RLBench general benchmark path until official baseline reproduction is clean,
416
+ - memory-related gains unless they appear in a proper task-success benchmark.
417
+
418
+ ---
419
+
420
+ ## 6. Benchmark strategy
421
+
422
+ The benchmark plan should be staged. Do not jump straight to a full RLBench sweep.
423
+
424
+ ### Phase 0. Baseline reproduction
425
+
426
+ Goal: prove that the evaluation path is real.
427
+
428
+ Required outcome:
429
+
430
+ - at least one official public trunk reproduces a known strong score on a small anchor subset,
431
+ - one anchor task should match a public or repo-validated release closely enough to trust the pipeline.
432
+
433
+ If this fails, stop and fix evaluation before touching the adapter further.
434
+
435
+ ### Phase 1. General-task anchor set
436
+
437
+ Use a small public anchor set that is broad enough to catch regressions, but small enough to run repeatedly on one L40S.
438
+
439
+ Recommended anchor tasks:
440
+
441
+ - coordinated push box,
442
+ - coordinated lift ball,
443
+ - dual push buttons,
444
+ - handover item,
445
+ - lift tray.
446
+
447
+ These are not the target application tasks. They are regression sentries.
448
+
449
+ Acceptance criterion:
450
+
451
+ - `adapter_noop` should be essentially identical to `trunk_only`,
452
+ - `adapter_active` should remain in the same ballpark as `trunk_only`,
453
+ - any loss on the anchor mean must be small and explainable.
454
+
455
+ If the trunk itself is weak on the chosen anchor set, replace the trunk. Do not proceed with a weak base.
456
+
457
+ ### Phase 2. Existing proxy benchmark (internal shaping only)
458
+
459
+ Use the existing proxy suite as an architecture-shaping instrument, not as the main paper result.
460
+
461
+ Preserve the narrow stress slices from the existing handoff:
462
+
463
+ - nominal,
464
+ - high reocclusion,
465
+ - camera perturbation.
466
+
467
+ Preserve the task slices:
468
+
469
+ - foliage,
470
+ - bag,
471
+ - cloth.
472
+
473
+ Keep the simple baselines:
474
+
475
+ - random,
476
+ - candidate 0,
477
+ - oracle chooser,
478
+ - scripted good/bad actions.
479
+
480
+ What to measure beyond success:
481
+
482
+ - reveal-state prediction correlation with proxy ground truth,
483
+ - ranking correlation with oracle utility,
484
+ - gate precision/recall for unsafe retrieve attempts,
485
+ - effect of proposal families by task,
486
+ - reocclusion after reveal,
487
+ - fold-preservation metrics on cloth slices.
488
+
489
+ ### Phase 3. Public target-like tasks
490
+
491
+ This is the most important new benchmark stage.
492
+
493
+ The future real benchmark does not exist yet, so approximate it with public tasks that stress:
494
+
495
+ - containment opening,
496
+ - hidden-object access,
497
+ - cluttered retrieval,
498
+ - partial reveal before retrieve,
499
+ - disturbance control.
500
+
501
+ Use a small public target-like subset first. Candidate tasks to prioritize:
502
+
503
+ - open drawer,
504
+ - put item in drawer / retrieve-like container interactions,
505
+ - take shoes out of box,
506
+ - shell game,
507
+ - pick up notebook,
508
+ - straighten rope.
509
+
510
+ The exact final subset can change if some tasks prove unstable, but the principle should stay the same: these tasks should be more target-like than the anchor set.
511
+
512
+ ### Phase 4. Deformable / garment benchmarks
513
+
514
+ For the clothes/suitcase direction, add a public deformable benchmark as soon as the infrastructure is stable.
515
+
516
+ Priority order:
517
+
518
+ 1. GarmentLab (if practical to run),
519
+ 2. GarmentPile or similar garment-clutter retrieval benchmarks,
520
+ 3. other public deformable-manipulation tasks only if they are easy to integrate.
521
+
522
+ This stage matters because the suitcase task is probably the strongest future novelty angle.
523
+
524
+ ### Phase 5. Broader robustness benchmark
525
+
526
+ Only after phases 0–4 succeed, consider a broader dual-arm benchmark such as RoboTwin 2.0 or a wider RLBench/PerAct2 sweep.
527
+
528
+ Do not do this early. It is expensive and not yet the right bottleneck.
529
+
530
+ ---
531
+
532
+ ## 7. Baselines that must be included
533
+
534
+ At minimum, every meaningful experiment should compare against:
535
+
536
+ 1. **the same trunk alone**
537
+ This is the most important baseline.
538
+
539
+ 2. **the same trunk with adapter disabled / no-op**
540
+ This isolates whether the wrapper is already damaging performance.
541
+
542
+ 3. **PerAct2**
543
+ Use official or faithful public numbers / code path.
544
+
545
+ 4. **AnyBimanual**
546
+ Important because the repo already references it and because transfer from strong unimanual data is relevant.
547
+
548
+ 5. **3DFA**, if evaluation is practical
549
+ This is the strongest public benchmark baseline for bimanual PerAct2-style tasks and should be the aspirational reference.
550
+
551
+ Optional if practical:
552
+
553
+ - CoFreeVLA (useful because it is also a structured auxiliary head on top of a VLA),
554
+ - ActiveVLA (conceptually relevant for active perception),
555
+ - task-specific academic comparisons in writing (Vision in Action, bag SOI model, garment retrieval papers), even if not reproduced in code.
556
+
557
+ ---
558
+
559
+ ## 8. Required ablations
560
+
561
+ The current repo already shows that “big architecture blob vs baseline” is not informative enough. The next paper-worthy evidence must isolate the actual source of gain.
562
+
563
+ Run the following ablations in order.
564
+
565
+ ### General-task ablations
566
+
567
+ 1. `trunk_only`
568
+ 2. `trunk + adapter_noop`
569
+ 3. `trunk + adapter_active (gate only)`
570
+ 4. `trunk + adapter_active (gate + reveal-state head)`
571
+ 5. `trunk + adapter_active (gate + reveal-state + proposal prior)`
572
+ 6. `trunk + adapter_active (gate + reveal-state + proposal prior + lightweight transition model)`
573
+ 7. optional: `+ short reveal cache`
574
+
575
+ Interpretation target:
576
+
577
+ - general tasks should not fall apart as structure is added,
578
+ - if they do, the adapter is not sufficiently no-op-safe.
579
+
580
+ ### Target-like ablations
581
+
582
+ 1. full adapter
583
+ 2. no gate
584
+ 3. no proposal prior
585
+ 4. no task conditioning
586
+ 5. no lightweight transition model
587
+ 6. no geometry
588
+ 7. no depth
589
+ 8. no cloth-specific metrics (for the cloth slice only)
590
+ 9. checkpoint routing only (to prove that routing alone is not the full story)
591
+
592
+ Interpretation target:
593
+
594
+ - gate should matter,
595
+ - proposal prior should matter,
596
+ - cloth-specific metrics should matter on cloth-like slices,
597
+ - routing alone should not account for the final gain.
598
+
599
+ ### Memory ablations
600
+
601
+ Do these late, not early:
602
+
603
+ - no memory,
604
+ - short reveal cache,
605
+ - current dual memory.
606
+
607
+ If dual memory does not clearly beat no memory on actual task success, drop it.
608
+
609
+ ---
610
+
611
+ ## 9. Tests to add or rewrite
612
+
613
+ The current suite is decent for plumbing. It now needs benchmark-faithfulness tests and ablation-protecting tests.
614
+
615
+ ### 9.1 Keep the current useful tests
616
+
617
+ Keep and maintain the existing tests that verify:
618
+
619
+ - proxy scripted benchmark directionality,
620
+ - geometry path activation under camera perturbation,
621
+ - dataset geometry fields,
622
+ - proposal shortlist plumbing,
623
+ - task metadata override behavior,
624
+ - candidate ranking loss behavior.
625
+
626
+ ### 9.2 Add the following tests
627
+
628
+ #### `test_trunk_noop_equivalence.py`
629
+
630
+ With adapter disabled or in strict no-op mode, verify that:
631
+
632
+ - action mean / candidate set match the trunk path exactly (or within tight tolerance),
633
+ - no planner or routing side effects change outputs.
634
+
635
+ This is the single most important new test.
636
+
637
+ #### `test_trunk_interface_official_eval_parity.py`
638
+
639
+ For one selected official trunk and one frozen batch, verify that:
640
+
641
+ - preprocessing,
642
+ - camera handling,
643
+ - token layout,
644
+ - action decoding,
645
+
646
+ match the official implementation path closely enough to trust the wrapper.
647
+
648
+ This should be an integration test, not just a shape test.
649
+
650
+ #### `test_adapter_gate_blocks_unsafe_retrieve.py`
651
+
652
+ Build explicit synthetic reveal states where retrieve should and should not be allowed. The current planner already contains similar logic; formalize it into a direct unit test.
653
+
654
+ #### `test_reveal_state_metric_calibration.py`
655
+
656
+ For proxy env rollouts with known labels, verify that predicted reveal-state metrics correlate with the simulator labels and are not collapsed.
657
+
658
+ #### `test_candidate_ranking_matches_oracle.py`
659
+
660
+ Given a batch with oracle candidate utilities from the proxy env, verify that training reduces the gap between the model ranker and the oracle chooser.
661
+
662
+ This should be a real learned ranking test, not just a toy-array loss test.
663
+
664
+ #### `test_task_specific_loss_masking.py`
665
+
666
+ Verify that foliage metrics are not trained on bag/cloth tasks, bag metrics are not trained on foliage/cloth tasks, etc.
667
+
668
+ #### `test_cloth_specific_metrics_affect_selection.py`
669
+
670
+ For cloth-like proxy cases, verify that fold-preservation / lift-too-much risk can change candidate selection even when nominal reachability is similar.
671
+
672
+ #### `test_general_eval_protocol_is_identical.py`
673
+
674
+ Ensure that `trunk_only`, `adapter_noop`, and `adapter_active` all use the same observation stack, same action horizon, same task subset, and same evaluation step budget.
675
+
676
+ This prevents accidental unfairness.
677
+
678
+ ### 9.3 Promote some current tests from “unit” to “benchmark guardrails”
679
+
680
+ The following should become part of the required CI / pre-run checklist:
681
+
682
+ - geometry path smoke test,
683
+ - dataset geometry/history test,
684
+ - no-op equivalence test,
685
+ - benchmark protocol identity test.
686
+
687
+ ---
688
+
689
+ ## 10. Metrics that matter
690
+
691
+ Do not rely on success alone.
692
+
693
+ ### General-task metrics
694
+
695
+ - task success,
696
+ - return (if available),
697
+ - variance across seeds,
698
+ - regression relative to trunk.
699
+
700
+ ### Target-like metrics
701
+
702
+ - success,
703
+ - visibility gain,
704
+ - access / insertion corridor gain,
705
+ - persistence / support gain,
706
+ - reocclusion after reveal,
707
+ - disturbance / damage,
708
+ - fold preservation (cloth-like slice),
709
+ - unsafe retrieve rate,
710
+ - oracle gap on candidate ranking.
711
+
712
+ ### Calibration / diagnostics
713
+
714
+ - correlation of predicted reveal metrics with simulator ground truth,
715
+ - gate precision / recall,
716
+ - candidate shortlist recall of oracle candidate,
717
+ - proposal mode usage by task,
718
+ - fallback rate to trunk.
719
+
720
+ The fallback rate matters. If the adapter almost never activates, then the system may preserve general performance but not meaningfully help target tasks. If it always activates and hurts general tasks, it is not safe enough.
721
+
722
+ ---
723
+
724
+ ## 11. Acceptance gates
725
+
726
+ These gates should determine whether to continue, simplify, or stop.
727
+
728
+ ### Gate A. Trunk validity
729
+
730
+ Pass only if an official or faithful trunk path is clearly non-trivial on the anchor set.
731
+
732
+ If this fails, stop. Do not spend effort on the adapter yet.
733
+
734
+ ### Gate B. No-op safety
735
+
736
+ Pass only if `adapter_noop` is effectively identical to `trunk_only`.
737
+
738
+ If this fails, stop and fix the wrapper.
739
+
740
+ ### Gate C. General-task parity
741
+
742
+ Pass only if `adapter_active` stays in the same ballpark as `trunk_only` on the anchor set. A small drop may be acceptable, but not a collapse.
743
+
744
+ Use a simple rule for the first pass:
745
+
746
+ - mean absolute drop on the anchor set should be very small,
747
+ - no single anchor task should collapse catastrophically.
748
+
749
+ If the adapter is helping target-like tasks but causing a broad general-task collapse, the architecture is not ready.
750
+
751
+ ### Gate D. Target-like gain
752
+
753
+ Pass only if the full adapter clearly beats:
754
+
755
+ - trunk alone,
756
+ - adapter no-op,
757
+ - random,
758
+ - candidate 0,
759
+ - and ideally narrows the oracle gap.
760
+
761
+ This is where the architecture starts to become scientifically interesting.
762
+
763
+ ### Gate E. Non-trivial novelty
764
+
765
+ Pass only if the gain is not explained almost entirely by checkpoint routing or trivial task labels. The final model should be a single structured adapter, not a routing script disguised as a model.
766
+
767
+ ---
768
+
769
+ ## 12. Recommended training strategy on 1×L40S
770
+
771
+ The compute constraint implies one principle: **do not retrain the trunk repeatedly**.
772
+
773
+ ### Use this strategy
774
+
775
+ 1. Choose one strong trunk.
776
+ 2. Freeze it.
777
+ 3. Build the adapter around it.
778
+ 4. Run many cheap adapter experiments.
779
+ 5. Only consider light trunk finetuning after the adapter is already useful.
780
+
781
+ ### Practical guidelines
782
+
783
+ - mixed precision everywhere practical,
784
+ - gradient checkpointing if needed,
785
+ - keep candidate counts modest,
786
+ - keep rollout horizon short,
787
+ - keep the transition model lightweight,
788
+ - train on a narrow but representative task set,
789
+ - log every candidate-level diagnostic needed for offline analysis.
790
+
791
+ ### What not to do
792
+
793
+ - do not repeatedly launch full-scale trunk retraining,
794
+ - do not run full benchmark sweeps before anchor parity is established,
795
+ - do not expand the world model before the lightweight version proves value,
796
+ - do not hide regressions behind different seeds, different demos, or different eval protocols.
797
+
798
+ ---
799
+
800
+ ## 13. Minimal execution order
801
+
802
+ Follow this order. Do not reorder it casually.
803
+
804
+ ### Step 1. Freeze the current repo as a historical branch
805
+
806
+ Keep it for reference, but stop treating it as the final architecture.
807
+
808
+ ### Step 2. Build a clean trunk interface
809
+
810
+ Get one official trunk path working and reproducible.
811
+
812
+ ### Step 3. Implement adapter no-op mode
813
+
814
+ This must pass no-op equivalence tests before any learning claims are made.
815
+
816
+ ### Step 4. Port only the strong ideas
817
+
818
+ Port:
819
+
820
+ - reveal-state head,
821
+ - task-routed macro proposal prior,
822
+ - retrieve-feasibility gate.
823
+
824
+ Do **not** port the full heavy memory/world-model stack by default.
825
+
826
+ ### Step 5. Add a lightweight transition model
827
+
828
+ Only over reveal-state summaries.
829
+
830
+ ### Step 6. Train adapter-only on proxy supervision and ranking
831
+
832
+ Focus on oracle-gap reduction and reveal-state prediction quality.
833
+
834
+ ### Step 7. Run anchor parity benchmark
835
+
836
+ If parity fails, stop and simplify.
837
+
838
+ ### Step 8. Run target-like public subset and existing proxy suite
839
+
840
+ If gains appear only on the internal proxy and nowhere else, the architecture is still too benchmark-shaped.
841
+
842
+ ### Step 9. Add garment/deformable benchmark
843
+
844
+ This is the most likely path to a strong suitcase/clothes result.
845
+
846
+ ### Step 10. Prepare the real-world data plan only after sim evidence is strong
847
+
848
+ The real teleop benchmark should come after a strong sim go/no-go decision, not before.
849
+
850
+ ---
851
+
852
+ ## 14. What “novel enough” should mean here
853
+
854
+ The novelty should be modest and crisp. It does not need to be a giant new architecture.
855
+
856
+ A reasonable novelty claim is:
857
+
858
+ - a foundation-policy-compatible structured adapter,
859
+ - explicit reveal-state variables for elastic occlusion,
860
+ - task-routed reveal macros,
861
+ - retrieve-feasibility gating,
862
+ - lightweight reveal-state rollout / reranking.
863
+
864
+ This is a good paper if:
865
+
866
+ - the base trunk is respected,
867
+ - the adapter is small,
868
+ - the gains are real on the target-like tasks,
869
+ - the general-task regression is small,
870
+ - the ablations isolate the contribution cleanly.
871
+
872
+ This is **not** a good paper if the final story is:
873
+
874
+ - “we replaced the trunk,”
875
+ - “we added many modules and one of them helped a bit,”
876
+ - “we route to a better checkpoint for each task,”
877
+ - “we get non-zero on one RLBench branch because demo retrieval rescued it.”
878
+
879
+ ---
880
+
881
+ ## 15. Proposed paper positioning (for later)
882
+
883
+ If the system works, position it against two groups of prior work.
884
+
885
+ ### General bimanual policy baselines
886
+
887
+ - PerAct2,
888
+ - AnyBimanual,
889
+ - 3D FlowMatch Actor,
890
+ - optionally CoFreeVLA as an “auxiliary structured head” comparator.
891
+
892
+ ### Target-task conceptual neighbors
893
+
894
+ - active bag reveal/retrieve from demonstrations,
895
+ - active perception for manipulation under occlusion,
896
+ - bag-specific SOI latent-dynamics models,
897
+ - occlusion-aware hidden-object retrieval in clutter,
898
+ - garment clutter retrieval / garment manipulation benchmarks.
899
+
900
+ The paper should say: generic bimanual foundation policies are good at general dual-arm manipulation, but they lack explicit reveal-state structure for elastic occlusion tasks. The adapter adds that structure while preserving general capability.
901
+
902
+ ---
903
+
904
+ ## 16. Deliverables expected from the developer
905
+
906
+ The handoff is not complete until the following exist.
907
+
908
+ ### Code deliverables
909
+
910
+ - clean trunk interface,
911
+ - adapter package,
912
+ - no-op path,
913
+ - lightweight transition model,
914
+ - benchmark scripts for anchor, proxy, and target-like subsets,
915
+ - required new tests,
916
+ - config files for all reported experiments.
917
+
918
+ ### Experimental deliverables
919
+
920
+ - trunk-only anchor benchmark report,
921
+ - adapter-noop parity report,
922
+ - full ablation report,
923
+ - target-like benchmark report,
924
+ - cloth/deformable benchmark report,
925
+ - candidate ranking / oracle gap diagnostics,
926
+ - reveal-state calibration plots.
927
+
928
+ ### Reporting format
929
+
930
+ Every report should include:
931
+
932
+ - exact checkpoint,
933
+ - exact demos,
934
+ - exact seeds,
935
+ - exact task subset,
936
+ - exact eval protocol,
937
+ - whether the adapter was off / noop / active,
938
+ - whether planner/gate/transition model were enabled,
939
+ - per-task scores and mean.
940
+
941
+ No undocumented “special” branches should be used for headline results.
942
+
943
+ ---
944
+
945
+ ## 17. Immediate next actions
946
+
947
+ 1. Pick the trunk to standardize around.
948
+ 2. Build and validate the no-op wrapper.
949
+ 3. Strip the adapter down to:
950
+ - reveal-state head,
951
+ - proposal prior,
952
+ - retrieve gate.
953
+ 4. Replace the heavy world model with a lightweight reveal-state transition model.
954
+ 5. Run anchor parity.
955
+ 6. Run proxy ranking and target-like subset.
956
+ 7. Decide whether memory is dropped permanently.
957
+ 8. Add garment benchmark integration.
958
+
959
+ That is the shortest path from the current repo to a defensible paper candidate.
960
+
961
+ ---
962
+
963
+ ## 18. Appendix: repo evidence that motivated this handoff
964
+
965
+ Relevant repo locations to inspect while implementing:
966
+
967
+ - Main model stack:
968
+ - `VLAarchtests/code/reveal_vla_bimanual/models/policy.py`
969
+ - `VLAarchtests/code/reveal_vla_bimanual/models/backbones.py`
970
+ - `VLAarchtests/code/reveal_vla_bimanual/models/reveal_head.py`
971
+ - `VLAarchtests/code/reveal_vla_bimanual/models/action_decoder.py`
972
+ - `VLAarchtests/code/reveal_vla_bimanual/models/planner.py`
973
+ - `VLAarchtests/code/reveal_vla_bimanual/models/observation_memory.py`
974
+ - `VLAarchtests/code/reveal_vla_bimanual/models/world_model.py`
975
+
976
+ - Training / losses:
977
+ - `VLAarchtests/code/reveal_vla_bimanual/train/losses.py`
978
+ - `VLAarchtests/code/reveal_vla_bimanual/train/trainer.py`
979
+ - `VLAarchtests/code/reveal_vla_bimanual/train/build_aligned_proposal_dataset.py`
980
+
981
+ - Existing tests worth keeping:
982
+ - `VLAarchtests/tests/test_proxy_scripted_bench.py`
983
+ - `VLAarchtests/tests/test_geometry_matters_under_camera_perturbation.py`
984
+ - `VLAarchtests/tests/test_memory_matters_under_high_reocclusion.py`
985
+ - `VLAarchtests/tests/test_rlbench_dataset_rgbd_geometry.py`
986
+ - `VLAarchtests/tests/test_candidate_ranking_loss.py`
987
+ - `VLAarchtests/tests/test_rvt_backbone_forward.py`
988
+
989
+ - Existing reports that matter:
990
+ - `VLAarchtests/artifacts/reports/sprint_v7_summary/reveal_sprint_summary.md`
991
+ - `VLAarchtests/artifacts/reports/task_routed_proxy_v1/summary.md`
992
+ - `reports/true_baseline_compare_subset3_v1/...`
993
+ - `reports/general_task_anchor_20260330_dual_push_buttons/...`
994
+ - `reports/dual_push_nonzero_branch_20260330/...`
995
+ - `reports/dual_push_full_arch_hybrid_20260331/...`
996
+
997
+ Use those reports as a diagnosis of what is weak, not as proof that the current architecture is already ready.
998
+
999
+ ---
1000
+
1001
+ ## 19. External references to keep in mind
1002
+
1003
+ General bimanual baselines and nearby work:
1004
+
1005
+ - PerAct2 benchmark and baselines: https://arxiv.org/abs/2407.00278
1006
+ - AnyBimanual: https://bimanual.github.io/
1007
+ - 3D FlowMatch Actor (3DFA): https://arxiv.org/abs/2508.11002
1008
+ - CoFreeVLA: https://arxiv.org/abs/2601.21712
1009
+ - ActiveVLA: https://arxiv.org/abs/2601.08325
1010
+
1011
+ Target-task conceptual neighbors:
1012
+
1013
+ - Vision in Action (active bag reveal/retrieve from human demonstrations): https://arxiv.org/html/2506.15666v1
1014
+ - Bimanual Deformable Bag Manipulation with SOI neural dynamics: https://arxiv.org/abs/2401.11432
1015
+ - Occlusion-Aware Search for Object Retrieval in Clutter: https://ieeexplore.ieee.org/document/9197067
1016
+ - GarmentPile++ / cluttered garment retrieval: https://arxiv.org/abs/2603.04158
1017
+ - RoboTwin 2.0 benchmark: https://arxiv.org/abs/2506.18088
1018
+
1019
+ Add the exact GarmentLab citation separately if that benchmark is included in the final experimental plan.
1020
+
1021
+ ---
1022
+
1023
+ ## Final instruction to the implementer
1024
+
1025
+ Do not try to rescue the current architecture by adding even more structure. The repo already revealed the answer: the good idea is narrow. Keep the structured reveal-state adapter, keep the retrieve gate, keep task-aware proposals, and force the whole design to prove two things cleanly:
1026
+
1027
+ 1. it does not break a strong trunk on general bimanual tasks,
1028
+ 2. it improves reveal/retrieve under elastic occlusion.
1029
+
1030
+ If both are true, the project is in good shape. If either is false, simplify further rather than expanding again.
legacy/general_task_anchor_20260330_dual_push_buttons/summary.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "anchor_task": "dual_push_buttons",
3
+ "anchor_type": "official_anybimanual_release_single_task_eval",
4
+ "public_release": {
5
+ "checkpoint_step": 60000,
6
+ "success": 0.96,
7
+ "return": 24.0,
8
+ "length": 21.56,
9
+ "source_csv": "/workspace/baselines/AnyBimanual/Peract-LF_AnyBimanual/eval_data.csv"
10
+ },
11
+ "local_official_eval": {
12
+ "checkpoint_step": 60000,
13
+ "episodes": 25,
14
+ "success": 0.96,
15
+ "return": 24.0,
16
+ "length": 21.84,
17
+ "source_csv": "/workspace/baselines/AnyBimanual_release_eval_anchor/perlf_release_dual_push_buttons_ep25/PERACT_BC/seed0/eval_data.csv"
18
+ },
19
+ "our_existing_results_same_task": {
20
+ "clip_backbone_only": {
21
+ "mean_success": 0.0,
22
+ "mean_return": 0.0,
23
+ "path": "/workspace/reports/true_baseline_compare_subset3_v1/rlbench_subset3_backbone_only_clip_100demo_fair_seed17_noplan_split/bimanual_dual_push_buttons/rollout_eval.json"
24
+ },
25
+ "elastic_reveal_proxy_iter6": {
26
+ "mean_success": 0.0,
27
+ "mean_return": 0.0,
28
+ "path": "/workspace/reports/true_baseline_compare_subset3_v1/rlbench_subset3_elastic_reveal_proxy_iter6_100demo_fair_seed17_noplan_split/bimanual_dual_push_buttons/rollout_eval.json"
29
+ },
30
+ "rvt_hybrid_frozen_fixedbounds": {
31
+ "mean_success": 0.0,
32
+ "mean_return": 0.0,
33
+ "path": "/workspace/reports/rvt_overlap_branch_fixedbounds_20260330/evals/rlbench_subset3_backbone_only_rvt_100demo_frozen_fixedbounds_seed17_noplan_split/bimanual_dual_push_buttons/rollout_eval.json"
34
+ }
35
+ },
36
+ "command": "export DISPLAY=${DISPLAY:-:99}; export MAMBA_ROOT_PREFIX=/workspace/.micromamba; set +u; eval \"$(/workspace/.tools/micromamba/bin/micromamba shell hook -s bash -r /workspace/.micromamba)\"; micromamba activate /workspace/envs/rlbench; set -u; export PYTHONPATH=\"/workspace/third_party/AnyBimanual/third_party/RLBench:/workspace/third_party/AnyBimanual/third_party/YARR:/workspace/third_party/AnyBimanual\"; cd /workspace/third_party/AnyBimanual && python eval.py method=PERACT_BC framework.logdir=/workspace/baselines/AnyBimanual_release_eval_anchor framework.start_seed=0 framework.eval_type=60000 framework.eval_episodes=25 framework.eval_envs=1 framework.gpu=0 rlbench.task_name=perlf_release_dual_push_buttons_ep25 rlbench.tasks='[dual_push_buttons]' rlbench.demo_path=/workspace/baselines/AnyBimanual_subset3_demo_root rlbench.headless=True rlbench.gripper_mode=BimanualDiscrete rlbench.arm_action_mode=BimanualEndEffectorPoseViaPlanning rlbench.action_mode=BimanualMoveArmThenGripper"
37
+ }
setup/ENVIRONMENT.md ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Environment Manifest
2
+
3
+ This export was assembled on:
4
+
5
+ - OS: `Ubuntu 22.04.5 LTS`
6
+ - Kernel: `Linux 6.8.0-88-generic`
7
+ - GPU: `NVIDIA L40S`
8
+ - VRAM: `46068 MiB`
9
+ - Driver: `580.126.09`
10
+ - Python: `3.10.20`
11
+
12
+ ## Primary Python Environment
13
+
14
+ The main RLBench-capable environment used during this handoff lived at:
15
+
16
+ - `/workspace/envs/rlbench`
17
+
18
+ The exact package snapshot from that environment is stored in:
19
+
20
+ - `setup/rlbench_pip_freeze.txt`
21
+
22
+ ## Upstream Pins
23
+
24
+ Pinned benchmark stack used by this project:
25
+
26
+ - `peract_bimanual`: `bb0232a6ba3fe116566e9568f0c7af980ed6703d`
27
+ - `RLBench`: `8af748c51287989294e00c9c670e3330a0e35ed5`
28
+ - `PyRep`: `b8bd1d7a3182adcd570d001649c0849047ebf197`
29
+ - `YARR`: `6822ff78602c77878b27d4cfe759ce029c67bffb`
30
+ - `AnyBimanual`: `76024e48b0e9489101459e85bc909c126ec581b4`
31
+
32
+ ## Important Runtime Variables
33
+
34
+ The RLBench / Coppelia / AnyBimanual stack was run with environment variables equivalent to:
35
+
36
+ ```bash
37
+ export DISPLAY=:99
38
+ export XDG_RUNTIME_DIR=/workspace/runtime
39
+ export COPPELIASIM_ROOT=/workspace/assets/coppeliasim_v4_1_0
40
+ export QT_QPA_PLATFORM_PLUGIN_PATH=/workspace/assets/coppeliasim_v4_1_0
41
+ export LD_LIBRARY_PATH=/workspace/assets/coppeliasim_v4_1_0:${LD_LIBRARY_PATH:-}
42
+ export PYTHONPATH=/workspace/third_party/PyRep:/workspace/third_party/RLBench:/workspace/third_party/YARR:/workspace/third_party/AnyBimanual
43
+ ```
44
+
45
+ For the local project code, the handoff runs also used:
46
+
47
+ ```bash
48
+ export PYTHONPATH=/workspace/workspace/VLAarchtests2_code/VLAarchtests/code/reveal_vla_bimanual:$PYTHONPATH
49
+ ```
50
+
51
+ ## Notes
52
+
53
+ - RLBench headless execution still required an X server.
54
+ - The abstract reveal/retrieve proxy benchmark did not depend on RLBench or CoppeliaSim.
55
+ - The official AnyBimanual `dual_push_buttons` path was the only general-task anchor treated as trustworthy on this setup.
setup/bootstrap_same_hardware.sh ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+
4
+ ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
5
+ ENV_DIR="${ENV_DIR:-/workspace/envs/rlbench}"
6
+ THIRD_PARTY_DIR="${THIRD_PARTY_DIR:-/workspace/third_party}"
7
+
8
+ mkdir -p "$THIRD_PARTY_DIR"
9
+
10
+ python3.10 -m venv "$ENV_DIR"
11
+ source "$ENV_DIR/bin/activate"
12
+
13
+ python -m pip install --upgrade pip setuptools wheel
14
+ python -m pip install -r "$ROOT_DIR/setup/requirements_core.txt"
15
+
16
+ if [ ! -d "$THIRD_PARTY_DIR/PyRep" ]; then
17
+ git clone https://github.com/markusgrotz/PyRep.git "$THIRD_PARTY_DIR/PyRep"
18
+ fi
19
+ if [ ! -d "$THIRD_PARTY_DIR/RLBench" ]; then
20
+ git clone https://github.com/markusgrotz/RLBench.git "$THIRD_PARTY_DIR/RLBench"
21
+ fi
22
+ if [ ! -d "$THIRD_PARTY_DIR/YARR" ]; then
23
+ git clone https://github.com/markusgrotz/YARR.git "$THIRD_PARTY_DIR/YARR"
24
+ fi
25
+ if [ ! -d "$THIRD_PARTY_DIR/AnyBimanual" ]; then
26
+ git clone https://github.com/liyaxuanliyaxuan/AnyBimanual.git "$THIRD_PARTY_DIR/AnyBimanual"
27
+ fi
28
+
29
+ git -C "$THIRD_PARTY_DIR/PyRep" checkout b8bd1d7a3182adcd570d001649c0849047ebf197
30
+ git -C "$THIRD_PARTY_DIR/RLBench" checkout 8af748c51287989294e00c9c670e3330a0e35ed5
31
+ git -C "$THIRD_PARTY_DIR/YARR" checkout 6822ff78602c77878b27d4cfe759ce029c67bffb
32
+ git -C "$THIRD_PARTY_DIR/AnyBimanual" checkout 76024e48b0e9489101459e85bc909c126ec581b4
33
+
34
+ python -m pip install -e "$THIRD_PARTY_DIR/PyRep"
35
+ python -m pip install -e "$THIRD_PARTY_DIR/RLBench"
36
+ python -m pip install -e "$THIRD_PARTY_DIR/YARR"
37
+
38
+ source "$ROOT_DIR/setup/env_vars.sh"
39
+
40
+ echo "Environment bootstrapped."
41
+ echo "You still need a compatible CoppeliaSim install at \$COPPELIASIM_ROOT."
42
+ echo "After that, activate the env and source setup/env_vars.sh before running RLBench or AnyBimanual jobs."
setup/env_vars.sh ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+
4
+ export DISPLAY="${DISPLAY:-:99}"
5
+ export XDG_RUNTIME_DIR="${XDG_RUNTIME_DIR:-/workspace/runtime}"
6
+ export COPPELIASIM_ROOT="${COPPELIASIM_ROOT:-/workspace/assets/coppeliasim_v4_1_0}"
7
+ export QT_QPA_PLATFORM_PLUGIN_PATH="${QT_QPA_PLATFORM_PLUGIN_PATH:-$COPPELIASIM_ROOT}"
8
+ export LD_LIBRARY_PATH="${COPPELIASIM_ROOT}:${LD_LIBRARY_PATH:-}"
9
+
10
+ # Upstream sim stack.
11
+ export PYTHONPATH="/workspace/third_party/PyRep:/workspace/third_party/RLBench:/workspace/third_party/YARR:/workspace/third_party/AnyBimanual:${PYTHONPATH:-}"
12
+
13
+ # Local project code snapshot from this export.
14
+ ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
15
+ export PYTHONPATH="${ROOT_DIR}/code/VLAarchtests2_code/VLAarchtests/code/reveal_vla_bimanual:${PYTHONPATH}"
setup/requirements_core.txt ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ accelerate==0.31.0
2
+ ftfy==6.2.0
3
+ huggingface_hub==0.36.2
4
+ hydra-core==1.3.2
5
+ matplotlib
6
+ numpy==1.26.4
7
+ omegaconf==2.3.0
8
+ open3d==0.19.0
9
+ opencv-python==4.10.0.84
10
+ pytest==9.0.2
11
+ pytest-xdist
12
+ rich==13.9.4
13
+ safetensors==0.4.3
14
+ scikit-learn==1.7.2
15
+ scipy==1.13.1
16
+ tensorboard==2.16.2
17
+ timm==1.0.26
18
+ torch==2.3.1
19
+ torchaudio==2.3.1
20
+ torchvision==0.18.1
21
+ transformers==4.41.2
22
+ yacs
setup/rlbench_pip_freeze.txt ADDED
@@ -0,0 +1,181 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ absl-py==2.1.0
2
+ accelerate==0.31.0
3
+ addict==2.4.0
4
+ aiohappyeyeballs==2.6.1
5
+ aiohttp==3.13.5
6
+ aiosignal==1.4.0
7
+ antlr4-python3-runtime==4.9.3
8
+ appdirs==1.4.4
9
+ asttokens==3.0.1
10
+ async-timeout==5.0.1
11
+ attrs==26.1.0
12
+ backports.zstd @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_backports.zstd_1767044984/work
13
+ blinker==1.9.0
14
+ blosc==1.11.4
15
+ Brotli @ file:///home/conda/feedstock_root/build_artifacts/brotli-split_1764016952863/work
16
+ cached-property @ file:///home/conda/feedstock_root/build_artifacts/cached_property_1615209429212/work
17
+ certifi @ file:///home/conda/feedstock_root/build_artifacts/certifi_1772001073725/work/certifi
18
+ cffi @ file:///home/conda/feedstock_root/build_artifacts/cffi_1761202865726/work
19
+ charset-normalizer @ file:///home/conda/feedstock_root/build_artifacts/charset-normalizer_1773659966602/work
20
+ click==8.3.1
21
+ click-prompt==0.5.1
22
+ clip @ git+https://github.com/openai/CLIP.git@d05afc436d78f1c48dc0dbf8e5980a9d471f35f6
23
+ cloudpickle==3.1.2
24
+ comm==0.2.3
25
+ ConfigArgParse==1.7.5
26
+ contourpy @ file:///home/conda/feedstock_root/build_artifacts/contourpy_1744743067588/work
27
+ cycler @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_cycler_1764466758/work
28
+ dash==4.1.0
29
+ decorator==5.2.1
30
+ docker-pycreds==0.4.0
31
+ einops==0.8.0
32
+ exceptiongroup==1.3.1
33
+ executing==2.2.1
34
+ Farama-Notifications==0.0.4
35
+ fastjsonschema==2.21.2
36
+ filelock @ file:///home/conda/feedstock_root/build_artifacts/filelock_1773313889543/work
37
+ Flask==3.1.3
38
+ fonttools @ file:///home/conda/feedstock_root/build_artifacts/fonttools_1773137064424/work
39
+ freetype-py==2.5.1
40
+ frozenlist==1.8.0
41
+ fsspec==2026.3.0
42
+ ftfy==6.2.0
43
+ gitdb==4.0.12
44
+ GitPython==3.1.46
45
+ gmpy2 @ file:///home/conda/feedstock_root/build_artifacts/gmpy2_1773244929835/work
46
+ grpcio==1.80.0
47
+ gym==0.26.2
48
+ gym-notices==0.1.0
49
+ gymnasium==1.0.0a2
50
+ h2 @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_h2_1756364871/work
51
+ h5py @ file:///home/conda/feedstock_root/build_artifacts/h5py_1774712049671/work
52
+ hf-xet==1.4.2
53
+ hpack @ file:///home/conda/feedstock_root/build_artifacts/hpack_1737618293087/work
54
+ huggingface_hub==0.36.2
55
+ hydra-core==1.3.2
56
+ hyperframe @ file:///home/conda/feedstock_root/build_artifacts/hyperframe_1737618333194/work
57
+ idna @ file:///home/conda/feedstock_root/build_artifacts/idna_1760286409563/work
58
+ imageio @ file:///home/conda/feedstock_root/build_artifacts/imageio_1738273805233/work
59
+ imageio-ffmpeg==0.6.0
60
+ importlib_metadata==9.0.0
61
+ iniconfig==2.3.0
62
+ ipython==8.39.0
63
+ ipywidgets==8.1.8
64
+ itsdangerous==2.2.0
65
+ jedi==0.19.2
66
+ Jinja2 @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_jinja2_1764517220/work
67
+ joblib==1.5.3
68
+ jsonschema==4.26.0
69
+ jsonschema-specifications==2025.9.1
70
+ jupyter_core==5.9.1
71
+ jupyterlab_widgets==3.0.16
72
+ kiwisolver @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_kiwisolver_1773067043/work
73
+ Markdown==3.10.2
74
+ markdown-it-py==4.0.0
75
+ MarkupSafe @ file:///home/conda/feedstock_root/build_artifacts/markupsafe_1772444934960/work
76
+ matplotlib @ file:///home/conda/feedstock_root/build_artifacts/matplotlib-suite_1715976200404/work
77
+ matplotlib-inline==0.2.1
78
+ mdurl==0.1.2
79
+ moviepy==2.2.1
80
+ mpmath @ file:///home/conda/feedstock_root/build_artifacts/mpmath_1773661943568/work
81
+ multidict==6.7.1
82
+ munkres==1.1.4
83
+ narwhals==2.18.1
84
+ natsort==8.4.0
85
+ nbformat==5.10.4
86
+ nest-asyncio==1.6.0
87
+ networkx @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_networkx_1731521053/work
88
+ numpy==1.26.4
89
+ omegaconf==2.3.0
90
+ open3d==0.19.0
91
+ openai==0.28.1
92
+ opencv-python==4.10.0.84
93
+ packaging @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_packaging_1769093650/work
94
+ pandas @ file:///home/conda/feedstock_root/build_artifacts/pandas_1744430447393/work
95
+ parso==0.8.6
96
+ pathtools==0.1.2
97
+ perceiver-pytorch==0.8.8
98
+ pexpect==4.9.0
99
+ pillow==12.1.1
100
+ platformdirs==4.9.4
101
+ plotly==6.6.0
102
+ pluggy==1.6.0
103
+ ply @ file:///home/conda/feedstock_root/build_artifacts/ply_1733239724146/work
104
+ poetry-core==2.3.2
105
+ proglog==0.1.12
106
+ prompt_toolkit==3.0.52
107
+ propcache==0.4.1
108
+ protobuf==4.25.9
109
+ psutil @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_psutil_1769678154/work
110
+ ptyprocess==0.7.0
111
+ pure_eval==0.2.3
112
+ py-spy==0.4.1
113
+ pycparser @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_pycparser_1733195786/work
114
+ pyglet==2.1.13
115
+ Pygments==2.20.0
116
+ PyOpenGL==3.1.0
117
+ pyparsing @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_pyparsing_1769003998/work
118
+ PyQt5==5.15.11
119
+ PyQt5_sip==12.17.0
120
+ pyquaternion==0.9.9
121
+ pyrender==0.1.45
122
+ -e git+https://github.com/markusgrotz/PyRep.git@b8bd1d7a3182adcd570d001649c0849047ebf197#egg=PyRep
123
+ PySocks @ file:///home/conda/feedstock_root/build_artifacts/pysocks_1733217236728/work
124
+ pytest==9.0.2
125
+ python-dateutil @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_python-dateutil_1751104122/work
126
+ python-dotenv==1.2.2
127
+ pytorch-lamb==1.0.0
128
+ pytz @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_pytz_1773679724/work
129
+ PyYAML @ file:///home/conda/feedstock_root/build_artifacts/pyyaml_1770223234623/work
130
+ referencing==0.37.0
131
+ regex==2024.5.15
132
+ requests @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_requests_1774894783/work
133
+ retrying==1.4.2
134
+ # Editable install with no version control (reveal-vla-bimanual==0.1.0)
135
+ -e /workspace/reveal_vla_bimanual
136
+ rich==13.9.4
137
+ rich-click==1.8.9
138
+ -e git+https://github.com/markusgrotz/RLBench.git@8af748c51287989294e00c9c670e3330a0e35ed5#egg=rlbench
139
+ rpds-py==0.30.0
140
+ safetensors==0.4.3
141
+ scikit-learn==1.7.2
142
+ scipy @ file:///home/conda/feedstock_root/build_artifacts/scipy-split_1716470219380/work/dist/scipy-1.13.1-cp310-cp310-linux_x86_64.whl#sha256=a4ff22b6dc27b61196be51695f53f9b0676e7c1bc564872b51fc3c41b79ae80b
143
+ segment-anything==1.0
144
+ sentry-sdk==2.57.0
145
+ setproctitle==1.3.7
146
+ sip @ file:///home/conda/feedstock_root/build_artifacts/sip_1759437834046/work
147
+ six @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_six_1753199211/work
148
+ smmap==5.0.3
149
+ stack-data==0.6.3
150
+ sympy @ file:///home/conda/feedstock_root/build_artifacts/sympy_1771952240620/work
151
+ tensorboard==2.16.2
152
+ tensorboard-data-server==0.7.2
153
+ tensorboardX==2.6.4
154
+ termcolor==3.3.0
155
+ threadpoolctl==3.6.0
156
+ timeout-decorator==0.5.0
157
+ timm==1.0.26
158
+ tokenizers==0.19.1
159
+ toml @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_toml_1764486833/work
160
+ tomli @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_tomli_1774492402/work
161
+ torch==2.3.1
162
+ torchaudio==2.3.1
163
+ torchvision==0.18.1
164
+ tornado @ file:///home/conda/feedstock_root/build_artifacts/tornado_1774357896577/work
165
+ tqdm @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_tqdm_1770153424/work
166
+ traitlets==5.14.3
167
+ transformers==4.41.2
168
+ transforms3d==0.4.1
169
+ trimesh @ file:///home/conda/feedstock_root/build_artifacts/trimesh_1774412449209/work
170
+ triton==2.3.1
171
+ typing_extensions @ file:///home/conda/feedstock_root/build_artifacts/bld/rattler-build_typing_extensions_1756220668/work
172
+ tzdata @ file:///home/conda/feedstock_root/build_artifacts/python-tzdata_1765719872007/work
173
+ unicodedata2 @ file:///home/conda/feedstock_root/build_artifacts/unicodedata2_1770908960326/work
174
+ urllib3 @ file:///home/conda/feedstock_root/build_artifacts/urllib3_1767817748113/work
175
+ wandb==0.14.0
176
+ wcwidth==0.2.14
177
+ Werkzeug==3.1.7
178
+ widgetsnbextension==4.0.15
179
+ yarl==1.23.0
180
+ -e git+https://github.com/markusgrotz/YARR.git@6822ff78602c77878b27d4cfe759ce029c67bffb#egg=yarr
181
+ zipp==3.23.0