## Public Benchmark Progress

Date: 2026-04-01 UTC

### Confirmed Real Public Benchmark Result

- Public occlusion proxy: `ManiSkill PickClutterYCB-v1`
- Strongest adapter-specific result so far:
  - summary: `/workspace/workspace/reports/maniskill_pickclutter_smoke_v5_eval_tuned_softerpref/public_benchmark_package_summary.json`
  - `trunk_only_ft = 0.04`
  - `adapter_noop = 0.04`
  - `adapter_active_ft = 0.62`
  - `delta_active_vs_trunk = +0.58`
  - `95% CI = [0.44, 0.72]`
  - `intervention_rate = 1.0`
  - `non_base_selection_rate = 1.0`
- Interpretation:
  - this is real adapter-specific sign of life on a public occlusion benchmark
  - the gain is not coming from a stronger shared trunk, because `adapter_noop` stays flat

### BEHAVIOR Bag Proxy Investigation

Target public task family:
- official BEHAVIOR grocery-store bag/container retrieval proxy
- primary candidate: `paying_for_purchases`
- stricter but currently unusable candidate: `buy_basic_garden_tools`

Environment used:
- BEHAVIOR assets: `/workspace/workspace/BEHAVIOR-1K`
- venv used for probes: `/workspace/envs/behavior`

Findings:
- `buy_basic_garden_tools` is blocked by official scene-task geometry:
  - repeated failure on `ontop ['rake.n.03_1', 'grocery_shelf.n.01_1']`
  - even with whitelist attempts, the sampler never found a valid shelf placement
- `paying_for_purchases` is much healthier:
  - `grocery_store_convenience`, `grocery_store_cafe`, and `grocery_store_asian` all load
  - object scope binds the real task objects:
    - `shopping_basket.n.01_1`
    - `money.n.01_1`
    - `checkout.n.03_1`
    - `floor.n.01_1`
- Root sampler bug:
  - official online sampling fails on the floor / agent chain
  - without patching, the blocking warning is:
    - `Room type [grocery_store] ... floor.n.01_1: , checkout.n.03_1: grocery_store_0`
  - after removing the agent-on-floor condition from the sampler pipeline, the next blocker is:
    - `ontop ['shopping_basket.n.01_1', 'floor.n.01_1'] False`
- Critical state-probe result:
  - even when object bindings exist, the sampled movable objects remain parked at their far-away import positions
  - observed example on `grocery_store_asian`:
    - basket position near `[120, 120, -80]`
    - money position near `[115, 115, -85]`
    - apples position near `[110, 110, -90]` and `[105, 105, -95]`
  - `money inside basket = False`
  - `apple1 inside basket = False`
  - `apple2 inside basket = False`
- Conclusion:
  - as of 2026-04-01, the BEHAVIOR bag proxy is not yet a usable fair evaluation track in this workspace
  - the public task objects bind, but the online sampler does not materialize a valid initial scene for training or evaluation

### Garment / Cloth Proxy Status

- GarmentLab repo cloned:
  - `/workspace/workspace/GarmentLab`
- Immediate constraint:
  - the repo expects Isaac Sim 4.0.0 plus external Google Drive assets
- Current status:
  - code inspected only
  - no runnable public cloth benchmark execution completed yet in this workspace

### Next Public Proxy Candidates

Given the BEHAVIOR blocker, the next-lightest public candidates already available locally are:

- `OpenCabinetDrawer-v1`
  - public ManiSkill task
  - good container reveal / access proxy
- `PutEggplantInBasketScene-v1`
  - public ManiSkill bridge-dataset task
  - public basket / container interaction proxy
- `PutSpoonOnTableClothInScene-v1`
  - public ManiSkill bridge-dataset cloth interaction proxy

### Immediate Recommendation

- Keep the confirmed `PickClutterYCB-v1` result as the anchor public success case.
- Do not spend more time on BEHAVIOR online sampling until either:
  - a cached valid scene instance is created, or
  - the sampler is patched deeply enough to place container objects correctly instead of leaving them at far-away import positions.
- Pivot the next train/eval smoke to a lighter public ManiSkill proxy before returning to BEHAVIOR.