🐼 A2C Agent on PandaReachDense-v3

This repository contains a trained Advantage Actor-Critic (A2C) agent that successfully plays the PandaReachDense-v3 environment using the Stable-Baselines3 library.


πŸ“Š Model Card

Model Name: a2c-PandaReachDense-v3
Environment: PandaReachDense-v3
Algorithm: A2C (Advantage Actor-Critic)
Performance Metric:

  • Mean Reward: 2.5
  • Demonstrates convergence toward stable reaching behavior

πŸš€ Usage (with Stable-Baselines3)

from stable_baselines3 import A2C
from huggingface_sb3 import load_from_hub
import gym

# Load the trained A2C model
model = load_from_hub(
    repo_id="KraTUZen/a2c-PandaReachDense-v3",
    filename="a2c.pkl"
)

# Initialize environment
env = gym.make(model["env_id"])

🧠 Notes

  • The agent is trained using A2C, a synchronous Actor-Critic method that reduces variance compared to vanilla policy gradient.
  • The environment is PandaReachDense-v3, where the agent must control a robotic arm to reach a target position.
  • The serialized policy is stored in a2c.pkl.

πŸ“‚ Repository Structure

  • a2c.pkl β†’ Trained policy weights
  • README.md β†’ Documentation and usage guide

βœ… Results

  • The agent learns to move the Panda robotic arm toward target positions.
  • Demonstrates stable convergence using A2C, though performance metrics show room for further optimization.

πŸ”Ž Environment Overview

  • Observation Space: Continuous (robot joint positions, target coordinates, gripper state)
  • Action Space: Continuous (joint torques, gripper control)
  • Objective: Reach the target position efficiently
  • Reward: Dense reward shaping to guide the agent toward the target

πŸ“š Learning Highlights

  • Algorithm: A2C (Advantage Actor-Critic)
  • Update Rule: Actor updates policy, Critic estimates value function to reduce variance
  • Strengths: More sample-efficient than vanilla policy gradient, stable learning
  • Limitations: Sensitive to hyperparameter tuning (learning rate, entropy coefficient)
Downloads last month
7
Video Preview
loading

Evaluation results