Initial release: ACT 10k step + horizon=32 1/1 success on LeIsaac SO-101 PickOrange

Browse files

Files changed (10) hide show

.gitattributes +1 -0
README.md +180 -0
act-pick-orange.png +3 -0
config.json +71 -0
model.safetensors +3 -0
policy_postprocessor.json +32 -0
policy_postprocessor_step_0_unnormalizer_processor.safetensors +3 -0
policy_preprocessor.json +64 -0
policy_preprocessor_step_3_normalizer_processor.safetensors +3 -0
train_config.json +206 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+act-pick-orange.png filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,180 @@

+---
+license: apache-2.0
+library_name: lerobot
+pipeline_tag: robotics
+tags:
+  - act
+  - lerobot
+  - so101
+  - leisaac
+  - pick-orange
+  - isaac-sim
+datasets:
+  - LightwheelAI/leisaac-pick-orange
+language:
+  - en
+base_model: lerobot/act
+---
+# ACT-PickOrange
+针对 [LeIsaac SO-101 PickOrange](https://github.com/LightwheelAI/leisaac) 任务从头训练的 [ACT (Action Chunking Transformer)](https://tonyzhaozh.github.io/aloha/) 策略。
+_An [ACT (Action Chunking Transformer)](https://tonyzhaozh.github.io/aloha/) policy trained from scratch on the [LeIsaac SO-101 PickOrange](https://github.com/LightwheelAI/leisaac) task._
+![ACT-PickOrange — SO-101 in Isaac Sim](act-pick-orange.png)
+## TL;DR
+- **任务 / Task**：`Pick up the orange and place it on the plate` — SO-101 单臂依次夹起 3 颗橙子并放盘子。
+  _Single-arm SO-101 picks 3 oranges sequentially and places each on a plate._
+- **数据集 / Dataset**：[`LightwheelAI/leisaac-pick-orange`](https://huggingface.co/datasets/LightwheelAI/leisaac-pick-orange) — 60 episode 遥操示范。
+- **架构 / Architecture**：ACT chunk_size=100，~80M 参数，纯 vision + joint state → action chunk regression（无 LLM / 无 diffusion）。
+- **训练 / Training**：batch=8 / lr=1e-5 / 10k step / **关闭图像增强**，~5h on RTX 4090。
+- **评测 / Eval**：Isaac Sim 5.1 + LeIsaac，**1/1 success @ 120s sim time**（3 颗全部放盘成功）。
+- **⚠️ 关键 inference 配置 / Critical inference setting**：`policy_action_horizon=32`。
+  默认值 16 会让模型卡在第二颗橙子（爪子抖），8 会卡在第一颗。详见下方 [Inference caveat](#-推理关键配置--critical-inference-caveat)。
+## 模型亮点
+_Highlights_
+- **复刻 + 验证 [shadowHokage/act_policy](https://huggingface.co/shadowHokage/act_policy) 的配方**，得到等价或更好的成功率。
+  _Reproduces and validates the [shadowHokage/act_policy](https://huggingface.co/shadowHokage/act_policy) recipe with comparable or better success rate._
+- **暴露了 LeIsaac 默认 `policy_action_horizon=16` 的隐性陷阱**：chunk_size=100 的 ACT 需要 horizon ≥ 32 才能让宏观运动段完整执行，详见 README 的诊断章节。
+  _Exposes a hidden trap in LeIsaac's default `policy_action_horizon=16`: ACT models with chunk_size=100 require horizon ≥ 32 to let the macro-motion segment of each chunk execute._
+- 无 image augmentation、无 weight decay 调参、无 special trick — 干净的 ACT baseline。
+## 训练配方
+_Training recipe_
+| 项 / Item | 值 / Value |
+|---|---|
+| Dataset | `LightwheelAI/leisaac-pick-orange` (60 ep, dual-cam 480×640 RGB + 6 DOF state, 30 Hz) |
+| Policy | `act` (LeRobot 实现 / LeRobot impl.) |
+| Backbone | ResNet18 vision encoder + Transformer encoder/decoder |
+| `chunk_size` | 100 |
+| `n_action_steps` | 100 |
+| Batch size | 8 |
+| Optimizer | AdamW |
+| Learning rate | 1e-5 (constant) |
+| Steps | 10,000 |
+| Image augmentation | **disabled** |
+| Hardware | RTX 4090 (24 GB) |
+| Wall-clock | ~5 hours |
+| Recipe credit | [shadowHokage/act_policy](https://huggingface.co/shadowHokage/act_policy) |
+训练入口脚本在我们的 LeIsaac fork：[`scripts/training/act/train.sh`](https://github.com/vitorcen/LeIsaac/blob/main/scripts/training/act/train.sh)。
+_Training entrypoint script lives in our LeIsaac fork: [`scripts/training/act/train.sh`](https://github.com/vitorcen/LeIsaac/blob/main/scripts/training/act/train.sh)._
+## 评测结果
+_Eval results_
+| 配置 / Config | 第 1 颗 | 第 2 颗 | 第 3 颗 | Episode 成功率 |
+|---|---|---|---|---|
+| horizon=8  | 🔴 卡死（夹住不动） | — | — | 0/1 |
+| horizon=16 | ✅ 成功 | 🟡 爪子抖 / muting | — | 0/1 |
+| **horizon=32** | ✅ 成功 | ✅ 折腾后成功 | ✅ 折腾后成功 | **1/1** ✅ |
+测试环境 / Test setup：Isaac Sim 5.1，task `LeIsaac-SO101-PickOrange-v0`，`episode_length_s=120`，`step_hz=30`，dual-cam 观测。
+_Test setup: Isaac Sim 5.1, task `LeIsaac-SO101-PickOrange-v0`, `episode_length_s=120`, `step_hz=30`, dual-cam observations._
+**单 sample 警告 / Single-sample caveat**：以上 1/1 是单一 episode 结果，未跑统计意义上的多轮平均。但 horizon=8 / 16 / 32 三个失败模式的 monotonic 趋势 (失败 → 部分失败 → 成功) 足以做 falsification — 不是模型问题，是配置问题。
+_The 1/1 success rate is from a single episode, not statistically averaged. However, the monotonic failure-mode pattern across horizon=8/16/32 (stuck → jitter → success) is sufficient as a falsification: this is a configuration issue, not a model capability issue._
+## ⚠️ 推理关键配置 / Critical inference caveat
+**ACT chunk_size=100 + 默认 horizon=16 = 第二颗橙子永远过不去。** 这不是 ACT 的弱点，是 LeIsaac 默认配置的隐性陷阱。
+_**ACT chunk_size=100 + the default horizon=16 will deadlock on the 2nd orange.** This is not an ACT weakness; it's a hidden trap in LeIsaac's default config._
+### 根因 / Root cause
+ACT 每个 chunk 输出 100 步动作，是一段**完整规划**：前 ~10 步是"启动 / 加速"，中段 (step 20-80) 才是真正的**宏观运动**（接近 → 夹起 → 提起 → 运送 → 释放）。LeRobot async client 用直接窗口 (receding horizon)，每 `policy_action_horizon` 步重新查询一次。
+_Each ACT chunk outputs a 100-step planned trajectory: the first ~10 steps are "startup", and steps 20-80 are the macro-motion (approach → grasp → lift → transport → release). The LeRobot async client uses a sliding window, re-querying every `policy_action_horizon` steps._
+- horizon=8 → 每次只执行前 8 步就丢掉重 query → 永远在执行"启动段"，**根本到不了宏观运动** → 卡死。
+  _horizon=8 → only the first 8 startup steps are ever executed → the macro-motion never fires → deadlock._
+- horizon=16 → 够第 1 颗的简单"靠近→夹起"，但第 2 颗的"放→后退→接近第 2 颗"复杂段需要更长执行窗 → 模型 OOD + 短 horizon 双重打击 → 抖。
+  _horizon=16 → enough for the simple "approach → grasp" of orange #1, but the post-1st-orange transition demands a longer execution window → OOD state + short horizon compound → jitter._
+- horizon=32 → 给 macro-motion 完整执行机会，1/1 通过。
+### 推荐配置 / Recommended settings
+```bash
+--policy_type=lerobot-act
+--policy_action_horizon=32
+--policy_checkpoint_path=<path-to-this-model>
+--step_hz=30                  # 对齐 dataset 30Hz / matches dataset 30Hz
+--episode_length_s=120
+```
+## 使用方法
+_Usage_
+### 1. 启动 LeRobot async policy_server
+```bash
+pip install lerobot
+python -m lerobot.async_inference.policy_server --host 0.0.0.0 --port 8080
+```
+### 2. 客户端启动 LeIsaac eval
+通过我们的 [vitorcen/LeIsaac](https://github.com/vitorcen/LeIsaac) fork：
+```bash
+cd LeIsaac
+bash scripts/evaluation/run_eval.sh -- \
+    --task=LeIsaac-SO101-PickOrange-v0 \
+    --eval_rounds=3 \
+    --episode_length_s=120 \
+    --step_hz=30 \
+    --policy_type=lerobot-act \
+    --policy_host=127.0.0.1 --policy_port=8080 \
+    --policy_checkpoint_path=wsagi/ACT-PickOrange \
+    --policy_action_horizon=32 \
+    --policy_language_instruction="Pick up the orange and place it on the plate" \
+    --device=cuda --enable_cameras
+```
+`run_eval.sh` 自动按 user-patience cap 计算 wall-clock timeout，避免无意义等待慢推理。
+_`run_eval.sh` auto-computes a user-patience wall-clock timeout so slow inference fails fast._
+## 局限性
+_Limitations_
+- **数据集 OOD on 2nd-3rd orange**：dataset 60 episode × 每集 1 次"放第 N 颗"演示。第 2/3 颗的 state coverage 比第 1 颗稀疏一个数量级，model 在那里 monotonic 变难、动作变"折腾"。即便 horizon=32 救了形式上的成功率，**精度仍随颗数线性退化**。这是数据问题不是模型问题。
+  _**Dataset OOD on 2nd–3rd orange**: with 60 episodes × 1 "place N-th orange" demo each, state coverage drops by ~1 order of magnitude per orange. Even at horizon=32 the policy gets visibly more jittery on later oranges. This is a data issue, not a model issue._
+- 三个独立架构 (我们的 ACT / Diffusion Policy / SmolVLA / 公开 shadowHokage ACT) 在同一 dataset 上 **共同 OOD on 3rd orange** — 全 family 共病。
+- 无图像增强、无 domain randomization → real-world transfer 可能弱。本 ckpt 仅用于 Isaac Sim 仿真验证，不保证真机 deploy。
+  _No image augmentation or domain randomization → real-world transfer is likely weak. This checkpoint is only validated in Isaac Sim simulation; real-robot deployment is not guaranteed._
+## 相关
+_Related_
+- 同任务对照 / Same-task comparisons：
+  - [`wsagi/DiffusionPolicy-PickOrange`](https://huggingface.co/wsagi/DiffusionPolicy-PickOrange) — 自训 Diffusion Policy (267M, DDIM 32-step swap)
+  - [`shadowHokage/act_policy`](https://huggingface.co/shadowHokage/act_policy) — 同配方公开 ckpt（我们的复刻参考）
+  - [`LightwheelAI/leisaac-pick-orange-v0`](https://huggingface.co/LightwheelAI/leisaac-pick-orange-v0) — GR00T N1.5 SOTA（30s 完成 3 颗）
+- 完整训练 + eval 配方：[vitorcen/LeIsaac](https://github.com/vitorcen/LeIsaac) fork
+## 致谢
+_Acknowledgments_
+- LeIsaac 团队 + LightwheelAI 提供任务环境和数据集
+- LeRobot 团队提供 ACT 实现 + async inference 框架
+- shadowHokage 公开训练配方作为复刻基线
+## 引用
+_Citation_
+```bibtex
+@inproceedings{zhao2023learning,
+  title={Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware},
+  author={Zhao, Tony Z. and Kumar, Vikash and Levine, Sergey and Finn, Chelsea},
+  booktitle={Robotics: Science and Systems},
+  year={2023}
+}
+```
+## License
+Apache-2.0

act-pick-orange.png ADDED Viewed

Git LFS Details

SHA256: 6b421b9a6e461ef309cf545e956e553680e3b80cc6d39a9da63d0db7b6276183
Pointer size: 131 Bytes
Size of remote file: 769 kB

config.json ADDED Viewed

	@@ -0,0 +1,71 @@

+{
+    "type": "act",
+    "n_obs_steps": 1,
+    "input_features": {
+        "observation.state": {
+            "type": "STATE",
+            "shape": [
+                6
+            ]
+        },
+        "observation.images.front": {
+            "type": "VISUAL",
+            "shape": [
+                3,
+                480,
+                640
+            ]
+        },
+        "observation.images.wrist": {
+            "type": "VISUAL",
+            "shape": [
+                3,
+                480,
+                640
+            ]
+        }
+    },
+    "output_features": {
+        "action": {
+            "type": "ACTION",
+            "shape": [
+                6
+            ]
+        }
+    },
+    "device": "cuda",
+    "use_amp": false,
+    "use_peft": false,
+    "push_to_hub": false,
+    "repo_id": null,
+    "private": null,
+    "tags": null,
+    "license": null,
+    "pretrained_path": null,
+    "chunk_size": 100,
+    "n_action_steps": 100,
+    "normalization_mapping": {
+        "VISUAL": "MEAN_STD",
+        "STATE": "MEAN_STD",
+        "ACTION": "MEAN_STD"
+    },
+    "vision_backbone": "resnet18",
+    "pretrained_backbone_weights": "ResNet18_Weights.IMAGENET1K_V1",
+    "replace_final_stride_with_dilation": false,
+    "pre_norm": false,
+    "dim_model": 512,
+    "n_heads": 8,
+    "dim_feedforward": 3200,
+    "feedforward_activation": "relu",
+    "n_encoder_layers": 4,
+    "n_decoder_layers": 1,
+    "use_vae": true,
+    "latent_dim": 32,
+    "n_vae_encoder_layers": 4,
+    "temporal_ensemble_coeff": null,
+    "dropout": 0.1,
+    "kl_weight": 10.0,
+    "optimizer_lr": 1e-05,
+    "optimizer_weight_decay": 0.0001,
+    "optimizer_lr_backbone": 1e-05
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b6f216a6f117b2b4a09af7706c10f48c4a7320d6d80172ce72b6f13d149032b2
+size 206699736

policy_postprocessor.json ADDED Viewed

	@@ -0,0 +1,32 @@

+{
+  "name": "policy_postprocessor",
+  "steps": [
+    {
+      "registry_name": "unnormalizer_processor",
+      "config": {
+        "eps": 1e-08,
+        "features": {
+          "action": {
+            "type": "ACTION",
+            "shape": [
+              6
+            ]
+          }
+        },
+        "norm_map": {
+          "VISUAL": "MEAN_STD",
+          "STATE": "MEAN_STD",
+          "ACTION": "MEAN_STD"
+        }
+      },
+      "state_file": "policy_postprocessor_step_0_unnormalizer_processor.safetensors"
+    },
+    {
+      "registry_name": "device_processor",
+      "config": {
+        "device": "cpu",
+        "float_dtype": null
+      }
+    }
+  ]
+}

policy_postprocessor_step_0_unnormalizer_processor.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5ac4af145fa293fb9282322bee7c87eb369ba8aca3e09dbf1db7600f46142fd5
+size 7552

policy_preprocessor.json ADDED Viewed

	@@ -0,0 +1,64 @@

+{
+  "name": "policy_preprocessor",
+  "steps": [
+    {
+      "registry_name": "rename_observations_processor",
+      "config": {
+        "rename_map": {}
+      }
+    },
+    {
+      "registry_name": "to_batch_processor",
+      "config": {}
+    },
+    {
+      "registry_name": "device_processor",
+      "config": {
+        "device": "cuda",
+        "float_dtype": null
+      }
+    },
+    {
+      "registry_name": "normalizer_processor",
+      "config": {
+        "eps": 1e-08,
+        "features": {
+          "observation.state": {
+            "type": "STATE",
+            "shape": [
+              6
+            ]
+          },
+          "observation.images.front": {
+            "type": "VISUAL",
+            "shape": [
+              3,
+              480,
+              640
+            ]
+          },
+          "observation.images.wrist": {
+            "type": "VISUAL",
+            "shape": [
+              3,
+              480,
+              640
+            ]
+          },
+          "action": {
+            "type": "ACTION",
+            "shape": [
+              6
+            ]
+          }
+        },
+        "norm_map": {
+          "VISUAL": "MEAN_STD",
+          "STATE": "MEAN_STD",
+          "ACTION": "MEAN_STD"
+        }
+      },
+      "state_file": "policy_preprocessor_step_3_normalizer_processor.safetensors"
+    }
+  ]
+}

policy_preprocessor_step_3_normalizer_processor.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5ac4af145fa293fb9282322bee7c87eb369ba8aca3e09dbf1db7600f46142fd5
+size 7552

train_config.json ADDED Viewed

	@@ -0,0 +1,206 @@

+{
+    "dataset": {
+        "repo_id": "LightwheelAI/leisaac-pick-orange",
+        "root": "/home/david/work/LeIsaac/datasets/raw/leisaac-pick-orange",
+        "episodes": null,
+        "image_transforms": {
+            "enable": false,
+            "max_num_transforms": 3,
+            "random_order": false,
+            "tfs": {
+                "brightness": {
+                    "weight": 1.0,
+                    "type": "ColorJitter",
+                    "kwargs": {
+                        "brightness": [
+                            0.8,
+                            1.2
+                        ]
+                    }
+                },
+                "contrast": {
+                    "weight": 1.0,
+                    "type": "ColorJitter",
+                    "kwargs": {
+                        "contrast": [
+                            0.8,
+                            1.2
+                        ]
+                    }
+                },
+                "saturation": {
+                    "weight": 1.0,
+                    "type": "ColorJitter",
+                    "kwargs": {
+                        "saturation": [
+                            0.5,
+                            1.5
+                        ]
+                    }
+                },
+                "hue": {
+                    "weight": 1.0,
+                    "type": "ColorJitter",
+                    "kwargs": {
+                        "hue": [
+                            -0.05,
+                            0.05
+                        ]
+                    }
+                },
+                "sharpness": {
+                    "weight": 1.0,
+                    "type": "SharpnessJitter",
+                    "kwargs": {
+                        "sharpness": [
+                            0.5,
+                            1.5
+                        ]
+                    }
+                },
+                "affine": {
+                    "weight": 1.0,
+                    "type": "RandomAffine",
+                    "kwargs": {
+                        "degrees": [
+                            -5.0,
+                            5.0
+                        ],
+                        "translate": [
+                            0.05,
+                            0.05
+                        ]
+                    }
+                }
+            }
+        },
+        "revision": null,
+        "use_imagenet_stats": true,
+        "video_backend": "pyav",
+        "return_uint8": false,
+        "streaming": false
+    },
+    "env": null,
+    "policy": {
+        "type": "act",
+        "n_obs_steps": 1,
+        "input_features": {
+            "observation.state": {
+                "type": "STATE",
+                "shape": [
+                    6
+                ]
+            },
+            "observation.images.front": {
+                "type": "VISUAL",
+                "shape": [
+                    3,
+                    480,
+                    640
+                ]
+            },
+            "observation.images.wrist": {
+                "type": "VISUAL",
+                "shape": [
+                    3,
+                    480,
+                    640
+                ]
+            }
+        },
+        "output_features": {
+            "action": {
+                "type": "ACTION",
+                "shape": [
+                    6
+                ]
+            }
+        },
+        "device": "cuda",
+        "use_amp": false,
+        "use_peft": false,
+        "push_to_hub": false,
+        "repo_id": null,
+        "private": null,
+        "tags": null,
+        "license": null,
+        "pretrained_path": null,
+        "chunk_size": 100,
+        "n_action_steps": 100,
+        "normalization_mapping": {
+            "VISUAL": "MEAN_STD",
+            "STATE": "MEAN_STD",
+            "ACTION": "MEAN_STD"
+        },
+        "vision_backbone": "resnet18",
+        "pretrained_backbone_weights": "ResNet18_Weights.IMAGENET1K_V1",
+        "replace_final_stride_with_dilation": false,
+        "pre_norm": false,
+        "dim_model": 512,
+        "n_heads": 8,
+        "dim_feedforward": 3200,
+        "feedforward_activation": "relu",
+        "n_encoder_layers": 4,
+        "n_decoder_layers": 1,
+        "use_vae": true,
+        "latent_dim": 32,
+        "n_vae_encoder_layers": 4,
+        "temporal_ensemble_coeff": null,
+        "dropout": 0.1,
+        "kl_weight": 10.0,
+        "optimizer_lr": 1e-05,
+        "optimizer_weight_decay": 0.0001,
+        "optimizer_lr_backbone": 1e-05
+    },
+    "output_dir": "/home/david/work/LeIsaac/outputs/act-leisaac-pick-orange",
+    "job_name": "act-leisaac-pick-orange",
+    "resume": false,
+    "seed": 1000,
+    "cudnn_deterministic": false,
+    "num_workers": 4,
+    "batch_size": 8,
+    "prefetch_factor": 4,
+    "persistent_workers": true,
+    "steps": 10000,
+    "eval_freq": 20000,
+    "log_freq": 200,
+    "tolerance_s": 0.0001,
+    "save_checkpoint": true,
+    "save_freq": 2000,
+    "use_policy_training_preset": true,
+    "optimizer": {
+        "type": "adamw",
+        "lr": 1e-05,
+        "weight_decay": 0.0001,
+        "grad_clip_norm": 10.0,
+        "betas": [
+            0.9,
+            0.999
+        ],
+        "eps": 1e-08
+    },
+    "scheduler": null,
+    "eval": {
+        "n_episodes": 50,
+        "batch_size": 22,
+        "use_async_envs": true
+    },
+    "wandb": {
+        "enable": false,
+        "disable_artifact": false,
+        "project": "lerobot",
+        "entity": null,
+        "notes": null,
+        "run_id": null,
+        "mode": null,
+        "add_tags": true
+    },
+    "peft": null,
+    "use_rabc": false,
+    "rabc_progress_path": null,
+    "rabc_kappa": 0.01,
+    "rabc_epsilon": 1e-06,
+    "rabc_head_mode": "sparse",
+    "rename_map": {},
+    "checkpoint_path": null
+}