🤖 CartPole-kh

An A2C agent that perfectly balances a pole — every single time.

🎬 Demo

The agent keeps the pole perfectly upright for the full 500 steps — the maximum possible score.

🎯 The Task

The CartPole-v1 environment challenges an agent to balance a pole on a moving cart. At each step, the agent pushes the cart left or right to prevent the pole from falling.

	Condition	Value
❌ Fail	Pole angle exceeds	± 12°
❌ Fail	Cart out of bounds	± 2.4 m
✅ Max score	Steps survived	500

Observation space (4 variables):

#	Variable	Description
0	Cart position	Position along the track
1	Cart velocity	Speed of the cart
2	Pole angle	Tilt from vertical
3	Pole angular velocity	Rotation speed

Action space: 0 = push left · 1 = push right

🏗️ Training Details

Parameter	Value
Algorithm	A2C (Advantage Actor-Critic)
Policy	MlpPolicy (actor-critic MLP)
Total timesteps	100 000
Parallel environments	4 (vectorized)
Framework	Stable-Baselines3 + Gymnasium
Experiment tracking	Weights & Biases

Training metrics (reward, policy loss, value loss, entropy) were fully tracked with W&B.

🚀 Quick Start

Install

pip install stable-baselines3 huggingface-sb3 "gymnasium[classic-control]"

Run

from stable_baselines3 import A2C
from huggingface_sb3 import load_from_hub
import gymnasium as gym

# Load from Hub
checkpoint = load_from_hub("ykhalfa/CartPole-kh", "a2c_cartpole.zip")
model = A2C.load(checkpoint)

# Watch it play
env = gym.make("CartPole-v1", render_mode="human")
obs, _ = env.reset()

while True:
    action, _ = model.predict(obs, deterministic=True)
    obs, reward, terminated, truncated, _ = env.step(action)
    if terminated or truncated:
        obs, _ = env.reset()

📊 Evaluation Results

Metric	Value
Mean reward (100 episodes)	500.0 ± 0.0
Success rate	100%

The agent achieves the maximum possible score consistently across all 100 evaluation episodes.

👤 Author

Youssef Khalfa MSO 3.4 — Apprentissage Automatique, École Centrale de Lyon

Downloads last month: 154

Video Preview

Reinforcement Learning

Evaluation results

Mean Reward on CartPole-v1
self-reported

500.00 +/- 0.00