Public Benchmark Progress
Date: 2026-04-01 UTC
Confirmed Real Public Benchmark Result
- Public occlusion proxy:
ManiSkill PickClutterYCB-v1 - Strongest adapter-specific result so far:
- summary:
/workspace/workspace/reports/maniskill_pickclutter_smoke_v5_eval_tuned_softerpref/public_benchmark_package_summary.json trunk_only_ft = 0.04adapter_noop = 0.04adapter_active_ft = 0.62delta_active_vs_trunk = +0.5895% CI = [0.44, 0.72]intervention_rate = 1.0non_base_selection_rate = 1.0
- summary:
- Interpretation:
- this is real adapter-specific sign of life on a public occlusion benchmark
- the gain is not coming from a stronger shared trunk, because
adapter_noopstays flat
BEHAVIOR Bag Proxy Investigation
Target public task family:
- official BEHAVIOR grocery-store bag/container retrieval proxy
- primary candidate:
paying_for_purchases - stricter but currently unusable candidate:
buy_basic_garden_tools
Environment used:
- BEHAVIOR assets:
/workspace/workspace/BEHAVIOR-1K - venv used for probes:
/workspace/envs/behavior
Findings:
buy_basic_garden_toolsis blocked by official scene-task geometry:- repeated failure on
ontop ['rake.n.03_1', 'grocery_shelf.n.01_1'] - even with whitelist attempts, the sampler never found a valid shelf placement
- repeated failure on
paying_for_purchasesis much healthier:grocery_store_convenience,grocery_store_cafe, andgrocery_store_asianall load- object scope binds the real task objects:
shopping_basket.n.01_1money.n.01_1checkout.n.03_1floor.n.01_1
- Root sampler bug:
- official online sampling fails on the floor / agent chain
- without patching, the blocking warning is:
Room type [grocery_store] ... floor.n.01_1: , checkout.n.03_1: grocery_store_0
- after removing the agent-on-floor condition from the sampler pipeline, the next blocker is:
ontop ['shopping_basket.n.01_1', 'floor.n.01_1'] False
- Critical state-probe result:
- even when object bindings exist, the sampled movable objects remain parked at their far-away import positions
- observed example on
grocery_store_asian:- basket position near
[120, 120, -80] - money position near
[115, 115, -85] - apples position near
[110, 110, -90]and[105, 105, -95]
- basket position near
money inside basket = Falseapple1 inside basket = Falseapple2 inside basket = False
- Conclusion:
- as of 2026-04-01, the BEHAVIOR bag proxy is not yet a usable fair evaluation track in this workspace
- the public task objects bind, but the online sampler does not materialize a valid initial scene for training or evaluation
Garment / Cloth Proxy Status
- GarmentLab repo cloned:
/workspace/workspace/GarmentLab
- Immediate constraint:
- the repo expects Isaac Sim 4.0.0 plus external Google Drive assets
- Current status:
- code inspected only
- no runnable public cloth benchmark execution completed yet in this workspace
Next Public Proxy Candidates
Given the BEHAVIOR blocker, the next-lightest public candidates already available locally are:
OpenCabinetDrawer-v1- public ManiSkill task
- good container reveal / access proxy
PutEggplantInBasketScene-v1- public ManiSkill bridge-dataset task
- public basket / container interaction proxy
PutSpoonOnTableClothInScene-v1- public ManiSkill bridge-dataset cloth interaction proxy
Immediate Recommendation
- Keep the confirmed
PickClutterYCB-v1result as the anchor public success case. - Do not spend more time on BEHAVIOR online sampling until either:
- a cached valid scene instance is created, or
- the sampler is patched deeply enough to place container objects correctly instead of leaving them at far-away import positions.
- Pivot the next train/eval smoke to a lighter public ManiSkill proxy before returning to BEHAVIOR.