KraTUZen
/

Reinforce-CartPole-v1

Reinforcement Learning

custom-implementation

Eval Results (legacy)

Model card Files Files and versions

🎯 Reinforce Agent on CartPole-v1

This repository contains a trained Reinforce (Policy Gradient) agent that successfully solves the CartPole-v1 environment.

📊 Model Card

Model Name: Reinforce-CartPole-v1
Environment: CartPole-v1
Algorithm: Reinforce (Monte Carlo Policy Gradient)
Performance Metric:

Achieves stable balancing of the pole across evaluation runs
Mean reward approaches the environment’s maximum threshold

🚀 Usage

from huggingface_hub import load_from_hub
import gym

# Load the trained Reinforce model
model = load_from_hub(
    repo_id="KraTUZen/Reinforce-CartPole-v1",
    filename="reinforce.pkl"
)

# Initialize environment
env = gym.make(model["env_id"])

🧠 Notes

The agent is trained using the Reinforce algorithm, which updates policy parameters via Monte Carlo returns.
The environment is CartPole-v1, where the objective is to keep the pole balanced by moving the cart left or right.
The serialized policy is stored in reinforce.pkl.

📂 Repository Structure

reinforce.pkl → Trained policy weights
README.md → Documentation and usage guide

✅ Results

The agent consistently balances the pole for extended episodes.
Demonstrates convergence to an optimal policy using policy gradient methods.

🔎 Environment Overview

Observation Space: Continuous (cart position, velocity, pole angle, angular velocity)
Action Space: Discrete (move cart left or right)
Objective: Prevent the pole from falling by applying forces to the cart
Reward: +1 for each timestep the pole remains upright

📚 Learning Highlights

Algorithm: Reinforce (Policy Gradient)
Update Rule: Policy parameters updated using returns from sampled episodes
Strengths: Simple yet effective baseline for policy gradient methods
Limitations: High variance in updates, mitigated with sufficient training episodes

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning

loading

Evaluation results

mean_reward on CartPole-v1
self-reported

500.00 +/- 0.00