π¬ Demo
The agent keeps the pole perfectly upright for the full 500 steps β the maximum possible score.
π― The Task
The CartPole-v1 environment challenges an agent to balance a pole on a moving cart. At each step, the agent pushes the cart left or right to prevent the pole from falling.
| Condition | Value | |
|---|---|---|
| β Fail | Pole angle exceeds | Β± 12Β° |
| β Fail | Cart out of bounds | Β± 2.4 m |
| β Max score | Steps survived | 500 |
Observation space (4 variables):
| # | Variable | Description |
|---|---|---|
| 0 | Cart position | Position along the track |
| 1 | Cart velocity | Speed of the cart |
| 2 | Pole angle | Tilt from vertical |
| 3 | Pole angular velocity | Rotation speed |
Action space: 0 = push left Β· 1 = push right
ποΈ Training Details
| Parameter | Value |
|---|---|
| Algorithm | A2C (Advantage Actor-Critic) |
| Policy | MlpPolicy (actor-critic MLP) |
| Total timesteps | 100 000 |
| Parallel environments | 4 (vectorized) |
| Framework | Stable-Baselines3 + Gymnasium |
| Experiment tracking | Weights & Biases |
Training metrics (reward, policy loss, value loss, entropy) were fully tracked with W&B.
π Quick Start
Install
pip install stable-baselines3 huggingface-sb3 "gymnasium[classic-control]"
Run
from stable_baselines3 import A2C
from huggingface_sb3 import load_from_hub
import gymnasium as gym
# Load from Hub
checkpoint = load_from_hub("ykhalfa/CartPole-kh", "a2c_cartpole.zip")
model = A2C.load(checkpoint)
# Watch it play
env = gym.make("CartPole-v1", render_mode="human")
obs, _ = env.reset()
while True:
action, _ = model.predict(obs, deterministic=True)
obs, reward, terminated, truncated, _ = env.step(action)
if terminated or truncated:
obs, _ = env.reset()
π Evaluation Results
| Metric | Value |
|---|---|
| Mean reward (100 episodes) | 500.0 Β± 0.0 |
| Success rate | 100% |
The agent achieves the maximum possible score consistently across all 100 evaluation episodes.
π€ Author
Youssef Khalfa MSO 3.4 β Apprentissage Automatique, Γcole Centrale de Lyon
- Downloads last month
- 154
Evaluation results
- Mean Reward on CartPole-v1self-reported500.00 +/- 0.00