🐼 A2C Agent on PandaPickAndPlace-v3

This repository contains a trained Advantage Actor-Critic (A2C) agent that successfully plays the PandaPickAndPlace-v3 environment.


πŸ“Š Model Card

Model Name: a2c-PandaPickAndPlace-v3
Environment: PandaPickAndPlace-v3
Algorithm: A2C (Advantage Actor-Critic)
Performance Metric:

  • Learns stable pick-and-place behavior across evaluation runs
  • Demonstrates convergence to an effective policy

πŸš€ Usage

from huggingface_hub import load_from_hub
import gym

# Load the trained A2C model
model = load_from_hub(
    repo_id="KraTUZen/a2c-PandaPickAndPlace-v3",
    filename="a2c.pkl"
)

# Initialize environment
env = gym.make(model["env_id"])

🧠 Notes

  • The agent is trained using A2C, a synchronous variant of Actor-Critic methods.
  • The environment is PandaPickAndPlace-v3, where the agent must control a robotic arm to pick up and place objects.
  • The serialized policy is stored in a2c.pkl.

πŸ“‚ Repository Structure

  • a2c.pkl β†’ Trained policy weights
  • README.md β†’ Documentation and usage guide

βœ… Results

  • The agent learns to control the Panda robotic arm effectively.
  • Demonstrates stable convergence using A2C with reduced variance compared to vanilla policy gradient methods.

πŸ”Ž Environment Overview

  • Observation Space: Continuous (robot joint positions, object positions, gripper state)
  • Action Space: Continuous (joint torques, gripper control)
  • Objective: Pick up the target object and place it at the designated location
  • Reward: Positive reward for successful placement, penalties for failed attempts

πŸ“š Learning Highlights

  • Algorithm: A2C (Advantage Actor-Critic)
  • Update Rule: Actor updates policy, Critic estimates value function to reduce variance
  • Strengths: More sample-efficient than vanilla policy gradient, stable learning
  • Limitations: Requires careful tuning of learning rate and entropy coefficient
Downloads last month
6
Video Preview
loading

Evaluation results