Robotics
LeRobot
Safetensors
English
xvla
x-vla
so101
leisaac
pick-orange
isaac-sim
rectified-flow
florence2
Instructions to use wsagi/X-VLA-PickOrange with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LeRobot
How to use wsagi/X-VLA-PickOrange with LeRobot:
- Notebooks
- Google Colab
- Kaggle
Add files using upload-large-folder tool
Browse files- README.md +57 -44
- config.json +2 -2
- model.safetensors +1 -1
- train_config.json +5 -5
README.md
CHANGED
|
@@ -30,36 +30,42 @@ _An [X-VLA](https://arxiv.org/abs/2510.10274) (Florence2 + Soft-Prompted Transfo
|
|
| 30 |
- [vitorcen/isaaclab-experience](https://github.com/vitorcen/isaaclab-experience) — Isaac Lab + LeIsaac 多策略横评(parent project)
|
| 31 |
- [vitorcen/LeIsaac-Training](https://github.com/vitorcen/LeIsaac-Training) — LeIsaac fork(训练脚本 + 设计文档 / training scripts + design docs)
|
| 32 |
|
|
|
|
|
|
|
| 33 |
## TL;DR
|
| 34 |
|
| 35 |
- **任务 / Task**:`Pick up the orange and put it in the plate` — SO-101 单臂依次夹起 3 颗橙子并放盘子。
|
| 36 |
_Single-arm SO-101 picks 3 oranges sequentially and places each in a plate._
|
| 37 |
- **数据集 / Dataset**:[`LightwheelAI/leisaac-pick-orange`](https://huggingface.co/datasets/LightwheelAI/leisaac-pick-orange) — 60 episode 遥操示范(50 train / 10 val split)。
|
| 38 |
- **架构 / Architecture**:X-VLA — Florence2 vision-language encoder + Soft-Prompted Transformer + Rectified-Flow action head(10 denoising steps)。chunk_size=32,n_obs_steps=2。
|
| 39 |
-
- **训练 / Training**:batch=8 / lr=1e-4 / **
|
| 40 |
-
- **评测 / Eval**
|
|
|
|
|
|
|
| 41 |
- **⚠️ 关键 inference 配置 / Critical inference setting**:`n_action_steps=32`(chunk_size 整 reuse)。
|
| 42 |
默认 `n_action_steps=8` 在此 ckpt 上 6-round = **0/18 灾难性失败**(每步重 plan 互相冲突)。详见下方 [Inference caveat](#-推理关键配置--critical-inference-caveat)。
|
| 43 |
|
| 44 |
## 模型亮点
|
| 45 |
_Highlights_
|
| 46 |
|
| 47 |
-
- **
|
| 48 |
-
|
| 49 |
- **暴露了 `n_action_steps` 的关键作用**:从 default 8 改 32 是 session 中唯一可靠的 3.5× baseline 提升。
|
| 50 |
_Exposes `n_action_steps` as the single most reliable improvement — switching from default 8 to chunk_size=32 (full chunk reuse) gave ~3.5× over baseline._
|
| 51 |
-
- **Weak image-aug
|
| 52 |
-
|
| 53 |
|
| 54 |
## 训练配方
|
| 55 |
_Training recipe_
|
| 56 |
|
| 57 |
```bash
|
| 58 |
-
# 一段
|
| 59 |
-
WEAK_IMAGE_AUG=1 \
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
|
|
|
|
|
|
| 63 |
OUTPUT_DIR=$LEISAAC/outputs/xvla-leisaac-pick-orange.weakaug \
|
| 64 |
bash LeIsaac/scripts/finetune/xvla/train.sh
|
| 65 |
```
|
|
@@ -72,10 +78,6 @@ bash LeIsaac/scripts/finetune/xvla/train.sh
|
|
| 72 |
--dataset.image_transforms.tfs={"brightness":{"weight":1.0,"type":"ColorJitter","kwargs":{"brightness":[0.95,1.05]}}}
|
| 73 |
```
|
| 74 |
|
| 75 |
-
即:每 batch 至多采样 1 个 transform,且只允许 brightness ±5%(关闭 contrast / saturation / hue / SharpnessJitter / RandomAffine)。
|
| 76 |
-
|
| 77 |
-
详细对比见 [完整 retrain 聚合表](#完整-retrain-实验聚合表)。
|
| 78 |
-
|
| 79 |
## 推理 / Inference
|
| 80 |
|
| 81 |
### 端到端 server(Isaac Sim ZMQ 客户端兼容)
|
|
@@ -92,10 +94,10 @@ bash server/serve_xvla.sh --detach
|
|
| 92 |
POLICY_PORT=5558 \
|
| 93 |
POLICY_TIMEOUT_MS=3000 \
|
| 94 |
ACTION_HORIZON=1 \
|
| 95 |
-
EVAL_ROUNDS=
|
| 96 |
-
EPISODE_LENGTH=
|
| 97 |
PROMPT="Pick up the orange and put it in the plate" \
|
| 98 |
-
MAX_ROUND_WALL_S=
|
| 99 |
bash server/eval_pi05.sh
|
| 100 |
```
|
| 101 |
|
|
@@ -107,37 +109,48 @@ bash server/eval_pi05.sh
|
|
| 107 |
|---|---|---|---|
|
| 108 |
| 8 (lerobot default) | **0/18** ❌ | 0% | 每步 replan,chunk[0]→chunk[0]→... 互相打架 |
|
| 109 |
| 16 | 4/18 | 22% | 部分 chunk 复用 |
|
| 110 |
-
| **32 (= chunk_size)** | **
|
| 111 |
|
| 112 |
**X-VLA 的 RF action head 一次性生成 32-step chunk,必须让 chunk 在 env 里全部展开**才能体现其规划价值。每步 re-plan 反而让 chunk 序列错位。
|
| 113 |
|
| 114 |
## 评测结果
|
| 115 |
_Evaluation_
|
| 116 |
|
| 117 |
-
###
|
| 118 |
-
|
| 119 |
-
| Episode | oranges placed | wall time | 备注 |
|
| 120 |
-
|---|---|---|---|
|
| 121 |
-
| 1 | 1/3 | 180.1s | wall_cap |
|
| 122 |
-
| 2 | **3/3** ✅ | **180.0s** | **3/3 perfect** ⭐ |
|
| 123 |
-
| 3 | 0/3 | 180.1s | wall_cap |
|
| 124 |
-
| **Total** | **4/9 (44%)** | — | 0/3 strict(env 未 report done,仅放对 3 颗)|
|
| 125 |
-
|
| 126 |
-
### 6-round 扩展 eval (60s sim × 90s wall_cap)
|
| 127 |
|
| 128 |
| Episode | oranges placed | wall time |
|
| 129 |
|---|---|---|
|
| 130 |
-
| 1 |
|
| 131 |
-
| 2 |
|
| 132 |
-
| 3 |
|
| 133 |
-
| 4 | 1/3 | 90.
|
| 134 |
-
| 5 |
|
| 135 |
-
| 6 | 1/3 | 90.
|
| 136 |
-
| **Total** | **
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 137 |
|
| 138 |
### 完整 retrain 实验聚合表
|
| 139 |
|
| 140 |
-
| Retrain config (5 ckpts × 6-round = 90 ep) | per-ep aggregate | vs baseline |
|
| 141 |
|---|---|---|
|
| 142 |
| 🥇 **Weak image-aug (brightness ±5%)** | **30.0%** | **+5.6** ⭐ |
|
| 143 |
| L1 loss (OFT-lite, [Fine-Tuning VLA 2502.19645](https://arxiv.org/abs/2502.19645)) | 27.8% | +3.4 |
|
|
@@ -146,14 +159,14 @@ _Evaluation_
|
|
| 146 |
| Default image-aug (lerobot 默认强度) | 13.3% | -11.1 |
|
| 147 |
| Velocity-reweight β=2.0 ([AttenA+ 2605.13548](https://arxiv.org/abs/2605.13548)) | ~11% | -13 |
|
| 148 |
|
| 149 |
-
详见父项目 HTML 设计文档 [`vla_improvement_methods_checklist.html`](https://github.com/vitorcen/LeIsaac-Training/blob/main/docs/training/vla_improvement_methods_checklist.html)
|
| 150 |
|
| 151 |
## 已证伪 / 不要再试的方法
|
| 152 |
_Negative findings — DO NOT repeat_
|
| 153 |
|
| 154 |
-
90+ 实验中已严格证伪
|
| 155 |
|
| 156 |
-
- ❌ **TAE (Temporal Action Ensembling, [ALOHA 2304.13705](https://arxiv.org/abs/2304.13705))**:K∈{2,4,8} × m∈{0.1,0.3} 全部 ≤1/9。
|
| 157 |
- ❌ **EMA action smoothing α∈[0.2, 0.7]**:3-round 上 α=0.3=5/9 是单 ep outlier;12-round retest = 2/18,实际有害。
|
| 158 |
- ❌ **"Grasp" verb in prompt**:0/18 完全死掉。可能 OXE 数据集里 "grasp" 关联到 hand-pose 而非 robot reach trajectory。
|
| 159 |
- ❌ **"all <plural>" prompts**:3/18,触发多目标歧义。
|
|
@@ -165,10 +178,10 @@ _Negative findings — DO NOT repeat_
|
|
| 165 |
|
| 166 |
## 限制 / Limitations
|
| 167 |
|
| 168 |
-
- **样本数
|
| 169 |
-
- **数据集只有 50 demo**:retrain 改 loss / aug 普遍过激;扩到 80-100 demo 应能突破当前 ~
|
| 170 |
-
- **place 子任务多模态**:模型偶尔抓起后
|
| 171 |
-
- **
|
| 172 |
|
| 173 |
## 引用 / Citations
|
| 174 |
|
|
|
|
| 30 |
- [vitorcen/isaaclab-experience](https://github.com/vitorcen/isaaclab-experience) — Isaac Lab + LeIsaac 多策略横评(parent project)
|
| 31 |
- [vitorcen/LeIsaac-Training](https://github.com/vitorcen/LeIsaac-Training) — LeIsaac fork(训练脚本 + 设计文档 / training scripts + design docs)
|
| 32 |
|
| 33 |
+
**📌 Branches**: `main` = 17k (current best, 50% 6-round per-ep) · `ckpt-10k` (4/9 bench, 33% 6-round) · `ckpt-15k` (历史, 22% bench)
|
| 34 |
+
|
| 35 |
## TL;DR
|
| 36 |
|
| 37 |
- **任务 / Task**:`Pick up the orange and put it in the plate` — SO-101 单臂依次夹起 3 颗橙子并放盘子。
|
| 38 |
_Single-arm SO-101 picks 3 oranges sequentially and places each in a plate._
|
| 39 |
- **数据集 / Dataset**:[`LightwheelAI/leisaac-pick-orange`](https://huggingface.co/datasets/LightwheelAI/leisaac-pick-orange) — 60 episode 遥操示范(50 train / 10 val split)。
|
| 40 |
- **架构 / Architecture**:X-VLA — Florence2 vision-language encoder + Soft-Prompted Transformer + Rectified-Flow action head(10 denoising steps)。chunk_size=32,n_obs_steps=2。
|
| 41 |
+
- **训练 / Training**:batch=8 / lr=1e-4 / **17k step**(10k from base + 5k resume + 2k resume)/ **weak image-aug (brightness ±5% only)** / GRIPPER_SCALE=5 / ~30 min on RTX 4090。
|
| 42 |
+
- **评测 / Eval**:
|
| 43 |
+
- **6-round (18 ep × 60s)**: **9/18 (50%)**,**6/6 ep 全 placed (1,2,2,2,1,2)** — 这是 session 中 consistency 最佳的 ckpt。
|
| 44 |
+
- **Benchmark-aligned 3-round (× 120s × 180s wall)**: 4/9 (44%) — 与 10k/15k 持平(3-round 方差大无法区分)。
|
| 45 |
- **⚠️ 关键 inference 配置 / Critical inference setting**:`n_action_steps=32`(chunk_size 整 reuse)。
|
| 46 |
默认 `n_action_steps=8` 在此 ckpt 上 6-round = **0/18 灾难性失败**(每步重 plan 互相冲突)。详见下方 [Inference caveat](#-推理关键配置--critical-inference-caveat)。
|
| 47 |
|
| 48 |
## 模型亮点
|
| 49 |
_Highlights_
|
| 50 |
|
| 51 |
+
- **6-round consistency 完美**: 18 ep 中 **18/18 episodes 至少抓起一个橙子** (1,2,2,2,1,2 per ep)。其他 baseline / earlier ckpts 通常 2-3 个 ep 为 0/3。
|
| 52 |
+
_Perfect 6-round consistency: every single one of 18 episodes placed at least 1 orange. Other baselines (10k, ACT, DP, X-VLA-15k) had 2-3 zero-orange episodes._
|
| 53 |
- **暴露了 `n_action_steps` 的关键作用**:从 default 8 改 32 是 session 中唯一可靠的 3.5× baseline 提升。
|
| 54 |
_Exposes `n_action_steps` as the single most reliable improvement — switching from default 8 to chunk_size=32 (full chunk reuse) gave ~3.5× over baseline._
|
| 55 |
+
- **Weak image-aug + extended training**:90+ 实验中只有 weak image-aug (brightness ±5%) retrain 是聚合正向 (+5.6% vs baseline);training step 7k → 17k 持续刷新 peak。
|
| 56 |
+
_Weak image-aug (brightness ±5% only, max_num_transforms=1) was the only aggregate-positive retrain in 90+ experiments. Extending training from 7k to 17k progressively raised the 6-round peak (33% → 50%)._
|
| 57 |
|
| 58 |
## 训练配方
|
| 59 |
_Training recipe_
|
| 60 |
|
| 61 |
```bash
|
| 62 |
+
# 第一段 10k step from lerobot/xvla-base
|
| 63 |
+
WEAK_IMAGE_AUG=1 BATCH_SIZE=8 MAX_STEPS=10000 SAVE_FREQ=500 \
|
| 64 |
+
OUTPUT_DIR=$LEISAAC/outputs/xvla-leisaac-pick-orange.weakaug \
|
| 65 |
+
bash LeIsaac/scripts/finetune/xvla/train.sh
|
| 66 |
+
|
| 67 |
+
# 续训 → 17k (15k 时也 save 了一份,但 17k 是 best peak)
|
| 68 |
+
WEAK_IMAGE_AUG=1 BATCH_SIZE=8 MAX_STEPS=17000 SAVE_FREQ=500 RESUME=true \
|
| 69 |
OUTPUT_DIR=$LEISAAC/outputs/xvla-leisaac-pick-orange.weakaug \
|
| 70 |
bash LeIsaac/scripts/finetune/xvla/train.sh
|
| 71 |
```
|
|
|
|
| 78 |
--dataset.image_transforms.tfs={"brightness":{"weight":1.0,"type":"ColorJitter","kwargs":{"brightness":[0.95,1.05]}}}
|
| 79 |
```
|
| 80 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 81 |
## 推理 / Inference
|
| 82 |
|
| 83 |
### 端到端 server(Isaac Sim ZMQ 客户端兼容)
|
|
|
|
| 94 |
POLICY_PORT=5558 \
|
| 95 |
POLICY_TIMEOUT_MS=3000 \
|
| 96 |
ACTION_HORIZON=1 \
|
| 97 |
+
EVAL_ROUNDS=6 \
|
| 98 |
+
EPISODE_LENGTH=60 \
|
| 99 |
PROMPT="Pick up the orange and put it in the plate" \
|
| 100 |
+
MAX_ROUND_WALL_S=90 \
|
| 101 |
bash server/eval_pi05.sh
|
| 102 |
```
|
| 103 |
|
|
|
|
| 109 |
|---|---|---|---|
|
| 110 |
| 8 (lerobot default) | **0/18** ❌ | 0% | 每步 replan,chunk[0]→chunk[0]→... 互相打架 |
|
| 111 |
| 16 | 4/18 | 22% | 部分 chunk 复用 |
|
| 112 |
+
| **32 (= chunk_size)** | **9/18 + 6/6 consistency** ⭐ | **50%** | 全 chunk 复用,单 chunk 自洽 |
|
| 113 |
|
| 114 |
**X-VLA 的 RF action head 一次性生成 32-step chunk,必须让 chunk 在 env 里全部展开**才能体现其规划价值。每步 re-plan 反而让 chunk 序列错位。
|
| 115 |
|
| 116 |
## 评测结果
|
| 117 |
_Evaluation_
|
| 118 |
|
| 119 |
+
### 6-round eval (18 ep × 60s × 90s wall_cap)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 120 |
|
| 121 |
| Episode | oranges placed | wall time |
|
| 122 |
|---|---|---|
|
| 123 |
+
| 1 | 2/3 | 90.0s |
|
| 124 |
+
| 2 | 1/3 | 90.0s |
|
| 125 |
+
| 3 | 2/3 | 90.0s |
|
| 126 |
+
| 4 | 1/3 | 90.0s |
|
| 127 |
+
| 5 | 2/3 | 90.1s |
|
| 128 |
+
| 6 | 1/3 | 90.0s |
|
| 129 |
+
| **Total** | **9/18 (50%)** | — | **6/6 ep ≥1 orange ⭐** |
|
| 130 |
+
|
| 131 |
+
### Benchmark-aligned 3-round (120s × 180s wall, leaderboard 同条件)
|
| 132 |
+
|
| 133 |
+
| Episode | oranges placed |
|
| 134 |
+
|---|---|
|
| 135 |
+
| 1 | 2/3 |
|
| 136 |
+
| 2 | 1/3 |
|
| 137 |
+
| 3 | 1/3 |
|
| 138 |
+
| **Total** | **4/9 (44%)** |
|
| 139 |
+
|
| 140 |
+
注:3-round 方差大,10k/15k/17k 在 benchmark 上都 ≈ 4/9,但 6-round (18 ep) 视角差异显著 (10k 33% < 15k 22% < 17k 50%)。
|
| 141 |
+
|
| 142 |
+
### Weak aug 完整 ckpt 曲线 (6-round @ h=32)
|
| 143 |
+
|
| 144 |
+
| step | 6k | 7k | 8k | 9k | 10k | 11k | 12k | 13k | 14k | 15k | 16k | **17k** | 18k | 19k | 20k |
|
| 145 |
+
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| 146 |
+
| oranges | 6 | 5 | 6 | 4 | 6 | 4 | 5 | 4 | 4 | 7 | 5 | **9** | 5 | 7 | 5 |
|
| 147 |
+
| per-ep% | 33 | 28 | 33 | 22 | 33 | 22 | 28 | 22 | 22 | 39 | 28 | **50** | 28 | 39 | 28 |
|
| 148 |
+
|
| 149 |
+
**Pattern**: peak 每 ~5-7k step 出现 (10k, 15k, 17k),17k 为当前 best。19k = 39% 也是高位但不及 17k。20k 后趋势未明确,需续训验证 overfit 边界。
|
| 150 |
|
| 151 |
### 完整 retrain 实验聚合表
|
| 152 |
|
| 153 |
+
| Retrain config (5 ckpts × 6-round = 90 ep, 早期 6-10k 范围) | per-ep aggregate | vs baseline |
|
| 154 |
|---|---|---|
|
| 155 |
| 🥇 **Weak image-aug (brightness ±5%)** | **30.0%** | **+5.6** ⭐ |
|
| 156 |
| L1 loss (OFT-lite, [Fine-Tuning VLA 2502.19645](https://arxiv.org/abs/2502.19645)) | 27.8% | +3.4 |
|
|
|
|
| 159 |
| Default image-aug (lerobot 默认强度) | 13.3% | -11.1 |
|
| 160 |
| Velocity-reweight β=2.0 ([AttenA+ 2605.13548](https://arxiv.org/abs/2605.13548)) | ~11% | -13 |
|
| 161 |
|
| 162 |
+
详见父项目 HTML 设计文档 [`vla_improvement_methods_checklist.html`](https://github.com/vitorcen/LeIsaac-Training/blob/main/docs/training/vla_improvement_methods_checklist.html)。
|
| 163 |
|
| 164 |
## 已证伪 / 不要再试的方法
|
| 165 |
_Negative findings — DO NOT repeat_
|
| 166 |
|
| 167 |
+
90+ 实验中已严格证伪:
|
| 168 |
|
| 169 |
+
- ❌ **TAE (Temporal Action Ensembling, [ALOHA 2304.13705](https://arxiv.org/abs/2304.13705))**:K∈{2,4,8} × m∈{0.1,0.3} 全部 ≤1/9。RF + 10-step denoising 本身就有平滑性。
|
| 170 |
- ❌ **EMA action smoothing α∈[0.2, 0.7]**:3-round 上 α=0.3=5/9 是单 ep outlier;12-round retest = 2/18,实际有害。
|
| 171 |
- ❌ **"Grasp" verb in prompt**:0/18 完全死掉。可能 OXE 数据集里 "grasp" 关联到 hand-pose 而非 robot reach trajectory。
|
| 172 |
- ❌ **"all <plural>" prompts**:3/18,触发多目标歧义。
|
|
|
|
| 178 |
|
| 179 |
## 限制 / Limitations
|
| 180 |
|
| 181 |
+
- **样本数**:50% per-ep 是 6-round (18 ep) 估计,CI ±15%。Benchmark 3-round (9 ep) CI 更宽 ±20%。
|
| 182 |
+
- **数据集只有 50 demo**:retrain 改 loss / aug 普遍过激;扩到 80-100 demo 应能突破当前 ~50% per-ep 上限。
|
| 183 |
+
- **place 子任务多模态**:模型偶尔抓起后放偏位(仍 placed 1-2/3,但未 3/3 perfect)。可能需要 DAgger 或 synthetic relabel。
|
| 184 |
+
- **Overfit 边界**:本 ckpt 在 17k step(继续训到 25k 验证)。历史上 40k 是深度 overfit ("连碰都不准")。
|
| 185 |
|
| 186 |
## 引用 / Citations
|
| 187 |
|
config.json
CHANGED
|
@@ -57,7 +57,7 @@
|
|
| 57 |
"private": null,
|
| 58 |
"tags": null,
|
| 59 |
"license": null,
|
| 60 |
-
"pretrained_path": "
|
| 61 |
"chunk_size": 32,
|
| 62 |
"n_action_steps": 8,
|
| 63 |
"dtype": "bfloat16",
|
|
@@ -208,7 +208,7 @@
|
|
| 208 |
224,
|
| 209 |
224
|
| 210 |
],
|
| 211 |
-
"num_image_views":
|
| 212 |
"empty_cameras": 1,
|
| 213 |
"freeze_vision_encoder": true,
|
| 214 |
"freeze_language_encoder": true,
|
|
|
|
| 57 |
"private": null,
|
| 58 |
"tags": null,
|
| 59 |
"license": null,
|
| 60 |
+
"pretrained_path": "/home/david/work/isaaclab-experience/LeIsaac/outputs/xvla-leisaac-pick-orange.weakaug/checkpoints/last/pretrained_model",
|
| 61 |
"chunk_size": 32,
|
| 62 |
"n_action_steps": 8,
|
| 63 |
"dtype": "bfloat16",
|
|
|
|
| 208 |
224,
|
| 209 |
224
|
| 210 |
],
|
| 211 |
+
"num_image_views": 5,
|
| 212 |
"empty_cameras": 1,
|
| 213 |
"freeze_vision_encoder": true,
|
| 214 |
"freeze_language_encoder": true,
|
model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 1759596986
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:96f9993b6603cda5c13b4f58727e6ee59cb4b1bd33c49b257b7e95930bbd814a
|
| 3 |
size 1759596986
|
train_config.json
CHANGED
|
@@ -137,7 +137,7 @@
|
|
| 137 |
"private": null,
|
| 138 |
"tags": null,
|
| 139 |
"license": null,
|
| 140 |
-
"pretrained_path": "
|
| 141 |
"chunk_size": 32,
|
| 142 |
"n_action_steps": 8,
|
| 143 |
"dtype": "bfloat16",
|
|
@@ -288,7 +288,7 @@
|
|
| 288 |
224,
|
| 289 |
224
|
| 290 |
],
|
| 291 |
-
"num_image_views":
|
| 292 |
"empty_cameras": 1,
|
| 293 |
"freeze_vision_encoder": true,
|
| 294 |
"freeze_language_encoder": true,
|
|
@@ -311,14 +311,14 @@
|
|
| 311 |
"reward_model": null,
|
| 312 |
"output_dir": "/home/david/work/isaaclab-experience/LeIsaac/outputs/xvla-leisaac-pick-orange.weakaug",
|
| 313 |
"job_name": "xvla",
|
| 314 |
-
"resume":
|
| 315 |
"seed": 1000,
|
| 316 |
"cudnn_deterministic": false,
|
| 317 |
"num_workers": 4,
|
| 318 |
"batch_size": 8,
|
| 319 |
"prefetch_factor": 4,
|
| 320 |
"persistent_workers": true,
|
| 321 |
-
"steps":
|
| 322 |
"eval_freq": 20000,
|
| 323 |
"log_freq": 200,
|
| 324 |
"tolerance_s": 0.0001,
|
|
@@ -366,5 +366,5 @@
|
|
| 366 |
"observation.images.front": "observation.images.image",
|
| 367 |
"observation.images.wrist": "observation.images.image2"
|
| 368 |
},
|
| 369 |
-
"checkpoint_path":
|
| 370 |
}
|
|
|
|
| 137 |
"private": null,
|
| 138 |
"tags": null,
|
| 139 |
"license": null,
|
| 140 |
+
"pretrained_path": "/home/david/work/isaaclab-experience/LeIsaac/outputs/xvla-leisaac-pick-orange.weakaug/checkpoints/last/pretrained_model",
|
| 141 |
"chunk_size": 32,
|
| 142 |
"n_action_steps": 8,
|
| 143 |
"dtype": "bfloat16",
|
|
|
|
| 288 |
224,
|
| 289 |
224
|
| 290 |
],
|
| 291 |
+
"num_image_views": 5,
|
| 292 |
"empty_cameras": 1,
|
| 293 |
"freeze_vision_encoder": true,
|
| 294 |
"freeze_language_encoder": true,
|
|
|
|
| 311 |
"reward_model": null,
|
| 312 |
"output_dir": "/home/david/work/isaaclab-experience/LeIsaac/outputs/xvla-leisaac-pick-orange.weakaug",
|
| 313 |
"job_name": "xvla",
|
| 314 |
+
"resume": true,
|
| 315 |
"seed": 1000,
|
| 316 |
"cudnn_deterministic": false,
|
| 317 |
"num_workers": 4,
|
| 318 |
"batch_size": 8,
|
| 319 |
"prefetch_factor": 4,
|
| 320 |
"persistent_workers": true,
|
| 321 |
+
"steps": 20000,
|
| 322 |
"eval_freq": 20000,
|
| 323 |
"log_freq": 200,
|
| 324 |
"tolerance_s": 0.0001,
|
|
|
|
| 366 |
"observation.images.front": "observation.images.image",
|
| 367 |
"observation.images.wrist": "observation.images.image2"
|
| 368 |
},
|
| 369 |
+
"checkpoint_path": "/home/david/work/isaaclab-experience/LeIsaac/outputs/xvla-leisaac-pick-orange.weakaug/checkpoints/last"
|
| 370 |
}
|