π― Reinforce Agent on CartPole-v1
This repository contains a trained Reinforce (Policy Gradient) agent that successfully solves the CartPole-v1 environment.
π Model Card
Model Name: Reinforce-CartPole-v1
Environment: CartPole-v1
Algorithm: Reinforce (Monte Carlo Policy Gradient)
Performance Metric:
- Achieves stable balancing of the pole across evaluation runs
- Mean reward approaches the environmentβs maximum threshold
π Usage
from huggingface_hub import load_from_hub
import gym
# Load the trained Reinforce model
model = load_from_hub(
repo_id="KraTUZen/Reinforce-CartPole-v1",
filename="reinforce.pkl"
)
# Initialize environment
env = gym.make(model["env_id"])
π§ Notes
- The agent is trained using the Reinforce algorithm, which updates policy parameters via Monte Carlo returns.
- The environment is CartPole-v1, where the objective is to keep the pole balanced by moving the cart left or right.
- The serialized policy is stored in
reinforce.pkl.
π Repository Structure
reinforce.pklβ Trained policy weightsREADME.mdβ Documentation and usage guide
β Results
- The agent consistently balances the pole for extended episodes.
- Demonstrates convergence to an optimal policy using policy gradient methods.
π Environment Overview
- Observation Space: Continuous (cart position, velocity, pole angle, angular velocity)
- Action Space: Discrete (move cart left or right)
- Objective: Prevent the pole from falling by applying forces to the cart
- Reward: +1 for each timestep the pole remains upright
π Learning Highlights
- Algorithm: Reinforce (Policy Gradient)
- Update Rule: Policy parameters updated using returns from sampled episodes
- Strengths: Simple yet effective baseline for policy gradient methods
- Limitations: High variance in updates, mitigated with sufficient training episodes
Evaluation results
- mean_reward on CartPole-v1self-reported500.00 +/- 0.00