YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

VLAarchtests2

Bundle staged from /workspace on 2026-03-31 UTC.

This repo is the follow-on organization repo to lsnu/VLAarchtests. It includes:

  • current code under VLAarchtests/
  • current third-party baseline code under third_party/
  • current baseline runs, replay artifacts, demo roots, and released checkpoint material under baselines/
  • current training outputs and checkpoints under outputs/
  • current logs under reports/
  • environment recreation files under environment/
  • raw results and change/test logs at the repo root
  • the previous repo README under history/VLAarchtests_previous_README.md
  • the active handoff file under handoff/instructions4.md

Top-Level Contents

  • VLAarchtests/
    • code, tests, configs, generated configs, reports, checkpoints, and proxy datasets from the current runpod workspace
  • third_party/AnyBimanual/
    • local AnyBimanual checkout used for the official overlap baseline branch, including local compatibility patches
  • baselines/
    • released AnyBimanual checkpoint material
    • overlap replay artifacts
      • HF export packaging note: baselines/AnyBimanual_overlap_replay/multi/ is sharded into subdirectories to satisfy the Hub 10000 files per directory limit
    • overlap run directories
    • local subset3 demo roots used by the overlap branch
  • outputs/
    • RLBench training outputs and checkpoints used by the current anchor, RVT, dual-push, and elastic-controller branches
  • reports/
    • training and evaluation logs copied from /workspace/reports
  • environment/
    • machine snapshot, package lists, and setup helpers
  • history/
    • copied previous-repo README
  • handoff/
    • active sprint instruction file
  • RESULTS_RAW.md
    • raw result tables and final official overlap eval outputs
  • CHANGE_AND_TEST_LOG.md
    • file-level change log and executed test commands
  • MODEL_AND_ARTIFACT_INDEX.md
    • staged directory map with main artifact roots

Previous Repo Coverage

The earlier lsnu/VLAarchtests repo covered the 2026-03-25/26 work. Its README is copied verbatim at:

  • history/VLAarchtests_previous_README.md

Previous-repo items explicitly referenced there include:

  • compact, spatial, compact-phase, and spatial-phase proxy branches
  • earlier RLBench direct-policy and kNN runs
  • environment recreation files
  • prior raw result tables

Current Session Additions

Current-session folders added or expanded in this repo include:

  • VLAarchtests/artifacts/reports/sprint_v7_summary/
  • VLAarchtests/artifacts/reports/sprint_v7_followup/
  • VLAarchtests/artifacts/reports/selector_finetune_v7_iterations/
  • VLAarchtests/artifacts/reports/selector_finetune_v7_iter6/
  • VLAarchtests/artifacts/reports/selector_finetune_v7_iter7/
  • VLAarchtests/artifacts/reports/selector_finetune_v7_iter8/
  • VLAarchtests/artifacts/reports/task_routed_proxy_v1/
  • VLAarchtests/artifacts/reports/rlbench_general_debug_20260330/
  • VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/
  • VLAarchtests/artifacts/reports/bag_mode_specialization_20260330/
  • VLAarchtests/artifacts/reports/general_task_anchor_20260330_dual_push_buttons/
  • VLAarchtests/artifacts/reports/rvt_overlap_branch_20260330/
  • VLAarchtests/artifacts/reports/rvt_overlap_branch_fixedbounds_20260330/
  • VLAarchtests/artifacts/reports/dual_push_nonzero_branch_20260330/
  • VLAarchtests/artifacts/reports/dual_push_full_arch_hybrid_20260331/

Raw Results Snapshot

Proxy sprint v7

Source:

  • VLAarchtests/artifacts/reports/sprint_v7_summary/reveal_sprint_summary_compact.json

Raw values:

  • base model mean success: 0.28
  • base per-task: foliage 0.39, bag 0.31, cloth 0.14
  • random mean success: 0.43333333333333335
  • candidate0 mean success: 0.2
  • oracle mean success: 0.4066666666666667
  • scripted mean success: 1.0

Eval-time ablations

Source:

  • VLAarchtests/artifacts/reports/sprint_v7_summary/reveal_sprint_summary_compact.json

Raw values:

  • no_planner: 0.2
  • no_memory: 0.3233333333333333
  • no_task_conditioning: 0.28
  • no_geometry: 0.27
  • no_camera_pose: 0.29333333333333333

Selector checkpoints

Sources:

  • VLAarchtests/artifacts/reports/selector_finetune_v7_iter6/default/reveal_benchmark.json
  • VLAarchtests/artifacts/reports/selector_finetune_v7_iter7/full_fixed_default/reveal_benchmark.json
  • VLAarchtests/artifacts/reports/selector_finetune_v7_iter8/bag_fixed_default/reveal_benchmark.json
  • VLAarchtests/artifacts/reports/task_routed_proxy_v1/summary.md

Raw values:

  • iter6 mean success: 0.4566666666666667
    • foliage 0.46, bag 0.4, cloth 0.51
  • iter7 mean success: 0.4666666666666666
    • foliage 0.4, bag 0.41, cloth 0.59
  • iter8 bag-only fixed slice: 0.41
  • routed controller mean success: 0.48666666666666664
    • routing rule: foliage -> iter6, bag -> iter8, cloth -> iter8
    • per-task: foliage 0.46, bag 0.41, cloth 0.59

Real baseline compare on proxy suite

Source:

  • VLAarchtests/artifacts/reports/real_baseline_compare_v7_full/reveal_benchmark.json

Raw values:

  • baseline_rgbd_stage3 mean success: 0.31
    • foliage 0.21, bag 0.15, cloth 0.57
  • iter5_selector mean success: 0.45
    • foliage 0.44, bag 0.4, cloth 0.51

RLBench recovered push-box comparator

Sources:

  • reports/rlbench_general_debug/rlbench_push_box_fair_step1_final_knn_ep10_x99_res224_len180_train80_fixed/bimanual_push_box/rollout_eval.json
  • reports/rlbench_general_debug/rlbench_push_box_historical_step1_knn_ep10_x99_res224_len180_train80_fixed/bimanual_push_box/rollout_eval.json

Raw values:

  • current fair-step1 final mean success: 0.7
  • current fair-step1 final successes:
    • [1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0]
  • historical push-box control mean success: 0.4
  • historical push-box control successes:
    • [0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0]

Official AnyBimanual overlap branch

Sources:

  • baselines/AnyBimanual_overlap_runs/peract_bc_subset3_overlap_smoke200_fixpretrain_nowandb3/PERACT_BC/seed0/training.log
  • reports/anybimanual_subset3_overlap_resume1000_eval.log

Raw train milestones:

  • global step 300: loss 40.91718
  • global step 400: loss 33.26684
  • global step 500: loss 36.07054
  • global step 600: loss 35.32345
  • global step 700: loss 28.50959
  • global step 800: loss 23.60169
  • global step 900: loss 15.28901
  • run reached weights/1000 and the train exited cleanly

Raw eval outputs:

  • source log: reports/anybimanual_subset3_overlap_resume1000_eval.log
  • summary files:
    • VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/resume1000_summary/summary.md
    • VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/resume1000_summary/summary.json
  • local last complete step: 1000
  • local mean success: 0.16
  • local per-task success:
    • coordinated_push_box: 0.0
    • coordinated_lift_ball: 0.0
    • dual_push_buttons: 0.48
  • local per-task return:
    • coordinated_push_box: 0.0
    • coordinated_lift_ball: 0.0
    • dual_push_buttons: 12.0
  • public best overlap step in the local summary: 60000
  • public best mean success in the local summary: 0.6933333333333334

Validated general-task anchor: dual_push_buttons

Sources:

  • VLAarchtests/artifacts/reports/general_task_anchor_20260330_dual_push_buttons/summary.json
  • baselines/AnyBimanual_release_eval_anchor/perlf_release_dual_push_buttons_ep25/PERACT_BC/seed0/eval_data.csv

Raw values:

  • public AnyBimanual release, step 60000: success 0.96, return 24.0, length 21.56
  • local official single-task eval, step 60000, 25 episodes: success 0.96, return 24.0, length 21.84
  • local clip backbone-only result on same task: success 0.0, return 0.0
  • local elastic reveal proxy iter6 result on same task: success 0.0, return 0.0
  • local RVT frozen fixed-bounds result on same task: success 0.0, return 0.0

RVT overlap branch

Sources:

  • VLAarchtests/artifacts/reports/rvt_overlap_branch_20260330/summary.md
  • VLAarchtests/artifacts/reports/rvt_overlap_branch_fixedbounds_20260330/summary.md

Raw values:

  • frozen RVT stage1 train summary:
    • outputs/rlbench_rvt_branch/rlbench_subset3_backbone_only_rvt_100demo_frozen_seed17/summary.json
    • final train total 0.043179353826920445
    • final val total 0.039591669984665984
  • frozen RVT overlap eval: mean success 0.0
  • frozen fixed-bounds RVT overlap eval: mean success 0.0
  • both branch gates:
    • local AnyBimanual overlap floor 0.16
    • stage2 run false

Dual-push non-privileged retarget branch

Sources:

  • VLAarchtests/artifacts/reports/dual_push_nonzero_branch_20260330/summary.md

Raw values:

  • demo replay through absolute_action_from_delta:
    • reports/dual_push_nonzero_branch_20260330/demo_replay/replay_summary.json
    • mean success 0.8
    • mean return 0.8
  • retargeted demo with checkpoint backbone retrieval and vision-only button localization:
    • reports/dual_push_nonzero_branch_20260330/retargeted_demo_backbone_vision_ep1/summary.json
    • mean success 1.0
    • mean return 1.0
  • retargeted demo with checkpoint backbone retrieval and vision-only button localization:
    • reports/dual_push_nonzero_branch_20260330/retargeted_demo_backbone_vision_ep5/summary.json
    • mean success 1.0
    • mean return 1.0

Dual-push full-architecture hybrid branch

Sources:

  • VLAarchtests/artifacts/reports/dual_push_full_arch_hybrid_20260331/summary.md
  • reports/dual_push_full_arch_probe_iter6_scene_ep1/summary.json
  • reports/dual_push_full_arch_hybrid_iter6_backbone_ep1/summary.json

Raw values:

  • elastic checkpoint retargeted-demo probe with scene retrieval and vision-only button localization:
    • 1 episode
    • mean success 1.0
    • mean return 1.0
    • steps 94
    • retrieved episode index 11
    • retrieval similarity 0.9998629689216614
  • full-architecture hybrid eval with elastic controller checkpoint plus dual-push retrieval checkpoint:
    • 1 episode
    • mean success 1.0
    • mean return 1.0
    • steps 116
    • path recoveries 0
    • noop fallbacks 0
    • first selected mode residual::maintain_opening
    • last selected mode residual::base_action

Environment Recreation

Environment files are under environment/, including:

  • environment/setup_same_hardware.sh
  • environment/runtime_env_vars.sh
  • environment/reconstruct_anybimanual_overlap_replay.sh
  • environment/hardware_snapshot.txt
  • environment/env_list.txt
  • environment/base_python.txt
  • environment/base_pip_freeze.txt
  • environment/rlbench_python.txt
  • environment/rlbench_pip_freeze.txt

Notes On Result Presentation

This repo-level README and the new root docs intentionally keep result text raw:

  • file paths
  • exact commands
  • exact numeric outputs
  • exact partial status for in-flight runs

Interpretive material already present inside older staged artifacts remains preserved as part of the historical workspace contents.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support