πŸš€ PPO Agent β€” LunarLander-v3

Trained with Stable-Baselines3 PPO on LunarLander-v3. Hyperparameters optimized via Optuna TPE β€” 20 trials. Part of a 12-project Deep RL portfolio.

πŸ“Š Performance

Version Score Notes
Baseline 210.15 Default SB3 params
Optuna v1 259.15 20-trial HPO sweep
Improvement +49.00 clip_range ↓, more stable

Target: β‰₯ 200 β€” cleared by 59.1 points.

⚑ Reproduce

from huggingface_sb3 import load_from_hub
from stable_baselines3 import PPO
import gymnasium as gym

# Load model
checkpoint = load_from_hub(
    repo_id  = "muhrivandysetiawan/ppo-LunarLander-v3",
    filename = "ppo-LunarLander-v3.zip",
)
model = PPO.load(checkpoint)

# Run episode
env = gym.make("LunarLander-v3", render_mode="human")
obs, _ = env.reset()
done   = False
while not done:
    action, _ = model.predict(obs, deterministic=True)
    obs, reward, terminated, truncated, _ = env.step(action)
    done = terminated or truncated
env.close()

πŸ—οΈ Architecture

OOP class-based pipeline β€” 6 classes, 3 entry points. Zero hardcoded hyperparameters β€” config-driven throughout. LunarLanderEnv β†’ env wrapping + seed control PPOAgent β†’ model lifecycle Trainer β†’ training loop HyperparamSweeper β†’ Optuna HPO Evaluator β†’ separate eval pipeline WandBLogger β†’ centralized tracking

πŸ”¬ Training Details

Parameter Baseline Optuna Best
learning_rate 3e-4 optimized
n_steps 1024 optimized
clip_range 0.20 0.161
batch_size 64 optimized
n_epochs 4 optimized
total_timesteps 1M 1M
seed 42 42
n_envs 16 16

πŸ“Š Experiment Tracking

W&B Dashboard β†’ https://wandb.ai/muhrivandysetiawan-muh-rivandy-setiawan/deep-rl-portfolio 22 tracked runs β€” baseline, 20 Optuna trials, final.

πŸ—‚οΈ Portfolio

Project 1/12 β€” HuggingFace Deep RL Curriculum Author: muhrivandysetiawan

Downloads last month
157
Video Preview
loading

Space using muhrivandysetiawan/ppo-LunarLander-v3 1

Evaluation results