## Public Benchmark Progress Date: 2026-04-01 UTC ### Confirmed Real Public Benchmark Result - Public occlusion proxy: `ManiSkill PickClutterYCB-v1` - Strongest adapter-specific result so far: - summary: `/workspace/workspace/reports/maniskill_pickclutter_smoke_v5_eval_tuned_softerpref/public_benchmark_package_summary.json` - `trunk_only_ft = 0.04` - `adapter_noop = 0.04` - `adapter_active_ft = 0.62` - `delta_active_vs_trunk = +0.58` - `95% CI = [0.44, 0.72]` - `intervention_rate = 1.0` - `non_base_selection_rate = 1.0` - Interpretation: - this is real adapter-specific sign of life on a public occlusion benchmark - the gain is not coming from a stronger shared trunk, because `adapter_noop` stays flat ### BEHAVIOR Bag Proxy Investigation Target public task family: - official BEHAVIOR grocery-store bag/container retrieval proxy - primary candidate: `paying_for_purchases` - stricter but currently unusable candidate: `buy_basic_garden_tools` Environment used: - BEHAVIOR assets: `/workspace/workspace/BEHAVIOR-1K` - venv used for probes: `/workspace/envs/behavior` Findings: - `buy_basic_garden_tools` is blocked by official scene-task geometry: - repeated failure on `ontop ['rake.n.03_1', 'grocery_shelf.n.01_1']` - even with whitelist attempts, the sampler never found a valid shelf placement - `paying_for_purchases` is much healthier: - `grocery_store_convenience`, `grocery_store_cafe`, and `grocery_store_asian` all load - object scope binds the real task objects: - `shopping_basket.n.01_1` - `money.n.01_1` - `checkout.n.03_1` - `floor.n.01_1` - Root sampler bug: - official online sampling fails on the floor / agent chain - without patching, the blocking warning is: - `Room type [grocery_store] ... floor.n.01_1: , checkout.n.03_1: grocery_store_0` - after removing the agent-on-floor condition from the sampler pipeline, the next blocker is: - `ontop ['shopping_basket.n.01_1', 'floor.n.01_1'] False` - Critical state-probe result: - even when object bindings exist, the sampled movable objects remain parked at their far-away import positions - observed example on `grocery_store_asian`: - basket position near `[120, 120, -80]` - money position near `[115, 115, -85]` - apples position near `[110, 110, -90]` and `[105, 105, -95]` - `money inside basket = False` - `apple1 inside basket = False` - `apple2 inside basket = False` - Conclusion: - as of 2026-04-01, the BEHAVIOR bag proxy is not yet a usable fair evaluation track in this workspace - the public task objects bind, but the online sampler does not materialize a valid initial scene for training or evaluation ### Garment / Cloth Proxy Status - GarmentLab repo cloned: - `/workspace/workspace/GarmentLab` - Immediate constraint: - the repo expects Isaac Sim 4.0.0 plus external Google Drive assets - Current status: - code inspected only - no runnable public cloth benchmark execution completed yet in this workspace ### Next Public Proxy Candidates Given the BEHAVIOR blocker, the next-lightest public candidates already available locally are: - `OpenCabinetDrawer-v1` - public ManiSkill task - good container reveal / access proxy - `PutEggplantInBasketScene-v1` - public ManiSkill bridge-dataset task - public basket / container interaction proxy - `PutSpoonOnTableClothInScene-v1` - public ManiSkill bridge-dataset cloth interaction proxy ### Immediate Recommendation - Keep the confirmed `PickClutterYCB-v1` result as the anchor public success case. - Do not spend more time on BEHAVIOR online sampling until either: - a cached valid scene instance is created, or - the sampler is patched deeply enough to place container objects correctly instead of leaving them at far-away import positions. - Pivot the next train/eval smoke to a lighter public ManiSkill proxy before returning to BEHAVIOR.