metadata
tags:
- Pixelcopter-PLE-v0
- reinforce
- reinforcement-learning
- custom-implementation
- deep-rl-class
model-index:
- name: Reinforce-PixelCopter
results:
- task:
type: reinforcement-learning
name: reinforcement-learning
dataset:
name: Pixelcopter-PLE-v0
type: Pixelcopter-PLE-v0
metrics:
- type: mean_reward
value: 12.03
name: mean_reward
verified: false
π Reinforce Agent on Pixelcopter-PLE-v0
This repository contains a trained Reinforce (Policy Gradient) agent that successfully plays the Pixelcopter-PLE-v0 environment.
π Model Card
Model Name: Reinforce-Pixelcopter-PLE-v0
Environment: Pixelcopter-PLE-v0
Algorithm: Reinforce (Monte Carlo Policy Gradient)
Performance Metric:
- Achieves stable flight and obstacle avoidance across evaluation runs
- Mean reward demonstrates convergence to an effective policy
π Usage
from huggingface_hub import load_from_hub
import gym
# Load the trained Reinforce model
model = load_from_hub(
repo_id="KraTUZen/Reinforce-Pixelcopter-PLE-v0",
filename="reinforce.pkl"
)
# Initialize environment
env = gym.make(model["env_id"])
π§ Notes
- The agent is trained using the Reinforce algorithm, which updates policy parameters based on episodic returns.
- The environment is Pixelcopter-PLE-v0, a pixel-based game where the agent must keep the helicopter flying while avoiding obstacles.
- The serialized policy is stored in
reinforce.pkl.
π Repository Structure
reinforce.pklβ Trained policy weightsREADME.mdβ Documentation and usage guide
β Results
- The agent learns to maintain altitude and avoid collisions with obstacles.
- Demonstrates convergence to a stable policy using policy gradient methods.
π Environment Overview
- Observation Space: Pixel-based state representation (visual input)
- Action Space: Discrete (flap or no flap)
- Objective: Keep the helicopter flying while avoiding obstacles
- Reward: Positive reward for survival, penalties for collisions
π Learning Highlights
- Algorithm: Reinforce (Policy Gradient)
- Update Rule: Policy parameters updated using returns from sampled episodes
- Strengths: Effective for environments with discrete actions and episodic rewards
- Limitations: High variance in updates, mitigated with sufficient training episodes