DemoGrasp-Force β Checkpoints & Training Runs
Pre-trained checkpoints and training logs for the DemoGrasp-Force project, which investigates whether force-aware training (reward shaping, observation augmentation, and VLM-predicted physical property conditioning) improves dexterous grasping under realistic physics.
Code: https://github.com/ruixucs/DemoGrasp-Force (commit 7f43ad6)
Repository Layout
.
βββ ckpt/ # 7 baseline checkpoints from the original DemoGrasp paper
β βββ fr3_dclaw.pt
β βββ fr3_panda_gripper.pt
β βββ fr3_shadow.pt
β βββ inspire.pt
β βββ shadow.pt
β βββ ur5_allegro.pt
β βββ ur5_svh.pt
βββ runs_ppo/
βββ E5_baseline/ # E5 β binary reward, no force obs, no VLM (reference baseline)
β βββ config.json # Full training config
β βββ events.out.tfevents... # TensorBoard scalars
β βββ model_{0..2100}.pt # 22 checkpoints every 100 PPO steps
βββ E7_vlm/ # E7 β binary reward, no force obs, + VLM physical-property conditioning
βββ config.json
βββ events.out.tfevents...
βββ model_{0..1400}.pt # 15 checkpoints every 100 PPO steps
All checkpoints are raw PyTorch state_dict files saved via torch.save().
Experimental Setup
| Item | Value |
|---|---|
| Hand | FR3 arm + Inspire dexterous hand (hand=fr3_inspire_tac) |
| Simulator | IsaacGym, 7000 parallel envs, RTX 3090 |
| Objects | union_ycb_unidex/union_ycb_debugset.yaml |
| Physics | objectFriction=0.5 (paper uses 1.0 β more realistic / harder) |
| Algorithm | PPO (train=PPOOneStep), domain randomization on |
| Convergence criterion | Success-rate mean stable over 20+ readings |
E5_baseline
| Param | Value |
|---|---|
task.env.rewardType |
binary |
task.env.observationType |
eefpose+objinitpose+objpcl |
| Final success rate | ~78% (mean over last 20 readings at step ~2100) |
| Convergence step | ~1500β2000 |
E7_vlm
| Param | Value |
|---|---|
task.env.rewardType |
binary |
task.env.observationType |
eefpose+objinitpose+objpcl+vlmprop |
| Final success rate | ~73β75% (mean oscillating 72β75% from step 700β1431) |
| Peak mean | 74.8 % at step 1353 |
| Convergence step | ~700β800 (plateau) |
| Finding | VLM physical-property conditioning alone yields a β3 to β5 % drop vs E5 |
See docs/experiment_results.md in the GitHub repo for the full 2Γ2Γ2 factorial design and ongoing E6, E8βE12 runs.
Usage
1. Download
With huggingface_hub:
from huggingface_hub import snapshot_download
# Everything
local = snapshot_download(repo_id="rayxu2005/DemoGrasp-Force")
# Just one file
from huggingface_hub import hf_hub_download
ckpt = hf_hub_download(
repo_id="rayxu2005/DemoGrasp-Force",
filename="runs_ppo/E5_baseline/model_2100.pt",
)
Or via git:
git lfs install
git clone https://huggingface.co/rayxu2005/DemoGrasp-Force
2. Load a checkpoint
import torch
state = torch.load("runs_ppo/E5_baseline/model_2100.pt", map_location="cpu")
# state contains 'model' (policy weights) and PPO optimizer state
3. Run inference
Use the code from https://github.com/ruixucs/DemoGrasp-Force and point play_policy.sh at the downloaded checkpoint.
Notes & Caveats
- Failed runs excluded: E6, E8, E9, E10, E11, E12 were killed early (Ninja build error or OOM) and are not included. See the GitHub repo for the rerun script (
scripts/run_experiments_e6.sh). - Training speed: ~40 min per 100 PPO steps on RTX 3090, roughly 5.7Γ slower than the paper's RTX 4090 timings. A full 20 000-step training would take ~125 h.
- Lower absolute success rate: Our ~78 % (E5) vs the paper's ~96 % is explained by harder physics (
objectFriction=0.5vs1.0) and a slower GPU. Relative comparisons across E5βE12 are what this repo is designed to support.
Citation
If you use these checkpoints, please also cite the original DemoGrasp paper (see the GitHub repo for details).