🐼 A2C Agent on PandaPickAndPlace-v3

This repository contains a trained Advantage Actor-Critic (A2C) agent that successfully plays the PandaPickAndPlace-v3 environment.

📊 Model Card

Model Name: a2c-PandaPickAndPlace-v3
Environment: PandaPickAndPlace-v3
Algorithm: A2C (Advantage Actor-Critic)
Performance Metric:

Learns stable pick-and-place behavior across evaluation runs
Demonstrates convergence to an effective policy

🚀 Usage

from huggingface_hub import load_from_hub
import gym

# Load the trained A2C model
model = load_from_hub(
    repo_id="KraTUZen/a2c-PandaPickAndPlace-v3",
    filename="a2c.pkl"
)

# Initialize environment
env = gym.make(model["env_id"])

🧠 Notes

The agent is trained using A2C, a synchronous variant of Actor-Critic methods.
The environment is PandaPickAndPlace-v3, where the agent must control a robotic arm to pick up and place objects.
The serialized policy is stored in a2c.pkl.

📂 Repository Structure

a2c.pkl → Trained policy weights
README.md → Documentation and usage guide

✅ Results

The agent learns to control the Panda robotic arm effectively.
Demonstrates stable convergence using A2C with reduced variance compared to vanilla policy gradient methods.

🔎 Environment Overview

Observation Space: Continuous (robot joint positions, object positions, gripper state)
Action Space: Continuous (joint torques, gripper control)
Objective: Pick up the target object and place it at the designated location
Reward: Positive reward for successful placement, penalties for failed attempts

📚 Learning Highlights

Algorithm: A2C (Advantage Actor-Critic)
Update Rule: Actor updates policy, Critic estimates value function to reduce variance
Strengths: More sample-efficient than vanilla policy gradient, stable learning
Limitations: Requires careful tuning of learning rate and entropy coefficient

Downloads last month: 6

Video Preview

Reinforcement Learning

Evaluation results

mean_reward on PandaPickAndPlace-v3
self-reported

2.5 +/- 0.00