πŸ€– CartPole-kh

An A2C agent that perfectly balances a pole β€” every single time.

Gymnasium SB3 W&B License: MIT


🎬 Demo

The agent keeps the pole perfectly upright for the full 500 steps β€” the maximum possible score.


🎯 The Task

The CartPole-v1 environment challenges an agent to balance a pole on a moving cart. At each step, the agent pushes the cart left or right to prevent the pole from falling.

Condition Value
❌ Fail Pole angle exceeds ± 12°
❌ Fail Cart out of bounds ± 2.4 m
βœ… Max score Steps survived 500

Observation space (4 variables):

# Variable Description
0 Cart position Position along the track
1 Cart velocity Speed of the cart
2 Pole angle Tilt from vertical
3 Pole angular velocity Rotation speed

Action space: 0 = push left Β· 1 = push right


πŸ—οΈ Training Details

Parameter Value
Algorithm A2C (Advantage Actor-Critic)
Policy MlpPolicy (actor-critic MLP)
Total timesteps 100 000
Parallel environments 4 (vectorized)
Framework Stable-Baselines3 + Gymnasium
Experiment tracking Weights & Biases

Training metrics (reward, policy loss, value loss, entropy) were fully tracked with W&B.


πŸš€ Quick Start

Install

pip install stable-baselines3 huggingface-sb3 "gymnasium[classic-control]"

Run

from stable_baselines3 import A2C
from huggingface_sb3 import load_from_hub
import gymnasium as gym

# Load from Hub
checkpoint = load_from_hub("ykhalfa/CartPole-kh", "a2c_cartpole.zip")
model = A2C.load(checkpoint)

# Watch it play
env = gym.make("CartPole-v1", render_mode="human")
obs, _ = env.reset()

while True:
    action, _ = model.predict(obs, deterministic=True)
    obs, reward, terminated, truncated, _ = env.step(action)
    if terminated or truncated:
        obs, _ = env.reset()

πŸ“Š Evaluation Results

Metric Value
Mean reward (100 episodes) 500.0 Β± 0.0
Success rate 100%

The agent achieves the maximum possible score consistently across all 100 evaluation episodes.


πŸ‘€ Author

Youssef Khalfa MSO 3.4 β€” Apprentissage Automatique, Γ‰cole Centrale de Lyon

Downloads last month
154
Video Preview
loading

Evaluation results