DemoGrasp-Force — Checkpoints & Training Runs

Pre-trained checkpoints and training logs for the DemoGrasp-Force project, which investigates whether force-aware training (reward shaping, observation augmentation, and VLM-predicted physical property conditioning) improves dexterous grasping under realistic physics.

Code: https://github.com/ruixucs/DemoGrasp-Force (commit 7f43ad6)

Repository Layout

.
├── ckpt/                             # 7 baseline checkpoints from the original DemoGrasp paper
│   ├── fr3_dclaw.pt
│   ├── fr3_panda_gripper.pt
│   ├── fr3_shadow.pt
│   ├── inspire.pt
│   ├── shadow.pt
│   ├── ur5_allegro.pt
│   └── ur5_svh.pt
└── runs_ppo/
    ├── E5_baseline/                  # E5 — binary reward, no force obs, no VLM (reference baseline)
    │   ├── config.json               # Full training config
    │   ├── events.out.tfevents...    # TensorBoard scalars
    │   └── model_{0..2100}.pt        # 22 checkpoints every 100 PPO steps
    └── E7_vlm/                       # E7 — binary reward, no force obs, + VLM physical-property conditioning
        ├── config.json
        ├── events.out.tfevents...
        └── model_{0..1400}.pt        # 15 checkpoints every 100 PPO steps

All checkpoints are raw PyTorch state_dict files saved via torch.save().

Experimental Setup

Item	Value
Hand	FR3 arm + Inspire dexterous hand (`hand=fr3_inspire_tac`)
Simulator	IsaacGym, 7000 parallel envs, RTX 3090
Objects	`union_ycb_unidex/union_ycb_debugset.yaml`
Physics	`objectFriction=0.5` (paper uses 1.0 — more realistic / harder)
Algorithm	PPO (`train=PPOOneStep`), domain randomization on
Convergence criterion	Success-rate mean stable over 20+ readings

E5_baseline

Param	Value
`task.env.rewardType`	`binary`
`task.env.observationType`	`eefpose+objinitpose+objpcl`
Final success rate	~78% (mean over last 20 readings at step ~2100)
Convergence step	~1500–2000

E7_vlm

Param	Value
`task.env.rewardType`	`binary`
`task.env.observationType`	`eefpose+objinitpose+objpcl+vlmprop`
Final success rate	~73–75% (mean oscillating 72–75% from step 700–1431)
Peak mean	74.8 % at step 1353
Convergence step	~700–800 (plateau)
Finding	VLM physical-property conditioning alone yields a −3 to −5 % drop vs E5

See docs/experiment_results.md in the GitHub repo for the full 2×2×2 factorial design and ongoing E6, E8–E12 runs.

Usage

1. Download

With huggingface_hub:

from huggingface_hub import snapshot_download

# Everything
local = snapshot_download(repo_id="rayxu2005/DemoGrasp-Force")

# Just one file
from huggingface_hub import hf_hub_download
ckpt = hf_hub_download(
    repo_id="rayxu2005/DemoGrasp-Force",
    filename="runs_ppo/E5_baseline/model_2100.pt",
)

Or via git:

git lfs install
git clone https://huggingface.co/rayxu2005/DemoGrasp-Force

2. Load a checkpoint

import torch
state = torch.load("runs_ppo/E5_baseline/model_2100.pt", map_location="cpu")
# state contains 'model' (policy weights) and PPO optimizer state

3. Run inference

Use the code from https://github.com/ruixucs/DemoGrasp-Force and point play_policy.sh at the downloaded checkpoint.

Notes & Caveats

Failed runs excluded: E6, E8, E9, E10, E11, E12 were killed early (Ninja build error or OOM) and are not included. See the GitHub repo for the rerun script (scripts/run_experiments_e6.sh).
Training speed: ~40 min per 100 PPO steps on RTX 3090, roughly 5.7× slower than the paper's RTX 4090 timings. A full 20 000-step training would take ~125 h.
Lower absolute success rate: Our ~78 % (E5) vs the paper's ~96 % is explained by harder physics (objectFriction=0.5 vs 1.0) and a slower GPU. Relative comparisons across E5–E12 are what this repo is designed to support.

Citation

If you use these checkpoints, please also cite the original DemoGrasp paper (see the GitHub repo for details).

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning