G1 Humanoid 6DoF Hands Locomotion (RL)
Overview
Reinforcement Learning locomotion policy for the Unitree G1 humanoid equipped with 6DoF Inspire robotic hands.
The policy was trained using RSL-RL within the MJLab framework, built on top of the MuJoCo physics engine.
Environment
- Physics engine: MuJoCo
- Framework: MJLab
- RL library: RSL-RL
- Task: Commanded velocity tracking
Framework Architecture
MJLab uses a modular manager-based RL environment:
- Observation manager
- Reward manager
- Curriculum manager
- Command manager
- Termination manager
This design enables scalable task composition, rapid reward iteration, and clean sim-to-real transfer pipelines.
Algorithm
- PPO
- On-policy runner
- Adaptive KL schedule
Observations
- Base orientation
- Base angular velocity
- Joint positions
- Joint velocities
- Commanded velocity
Actions
- Joint position targets
Network Architecture
Actor
- [512, 256, 128]
- ELU activation
Critic
- [512, 256, 128]
- ELU activation
Training Details
- Iterations: 4000
- Steps per environment: 24
- Gamma: 0.99
- Lambda: 0.95
- Learning rate: 1e-3
- Mini-batches: 4
- Epochs per update: 5
- Seed: 42
Files
checkpoints/model_4000.pt→ trained policyonnx/policy.onnx→ deployment-ready modelconfigs/→ training configuration
Author
Josué Abad
Lab
NONHUMAN Robotics – Perú