Update README.md

bca8553 verified about 1 month ago

2.53 kB

tags:
  - Pixelcopter-PLE-v0
  - reinforce
  - reinforcement-learning
  - custom-implementation
  - deep-rl-class
model-index:
  - name: Reinforce-PixelCopter
    results:
      - task:
          type: reinforcement-learning
          name: reinforcement-learning
        dataset:
          name: Pixelcopter-PLE-v0
          type: Pixelcopter-PLE-v0
        metrics:
          - type: mean_reward
            value: 12.03
            name: mean_reward
            verified: false

🚁 Reinforce Agent on Pixelcopter-PLE-v0

This repository contains a trained Reinforce (Policy Gradient) agent that successfully plays the Pixelcopter-PLE-v0 environment.

📊 Model Card

Model Name: Reinforce-Pixelcopter-PLE-v0
Environment: Pixelcopter-PLE-v0
Algorithm: Reinforce (Monte Carlo Policy Gradient)
Performance Metric:

Achieves stable flight and obstacle avoidance across evaluation runs
Mean reward demonstrates convergence to an effective policy

🚀 Usage

from huggingface_hub import load_from_hub
import gym

# Load the trained Reinforce model
model = load_from_hub(
    repo_id="KraTUZen/Reinforce-Pixelcopter-PLE-v0",
    filename="reinforce.pkl"
)

# Initialize environment
env = gym.make(model["env_id"])

🧠 Notes

The agent is trained using the Reinforce algorithm, which updates policy parameters based on episodic returns.
The environment is Pixelcopter-PLE-v0, a pixel-based game where the agent must keep the helicopter flying while avoiding obstacles.
The serialized policy is stored in reinforce.pkl.

📂 Repository Structure

reinforce.pkl → Trained policy weights
README.md → Documentation and usage guide

✅ Results

The agent learns to maintain altitude and avoid collisions with obstacles.
Demonstrates convergence to a stable policy using policy gradient methods.

🔎 Environment Overview

Observation Space: Pixel-based state representation (visual input)
Action Space: Discrete (flap or no flap)
Objective: Keep the helicopter flying while avoiding obstacles
Reward: Positive reward for survival, penalties for collisions

📚 Learning Highlights

Algorithm: Reinforce (Policy Gradient)
Update Rule: Policy parameters updated using returns from sampled episodes
Strengths: Effective for environments with discrete actions and episodic rewards
Limitations: High variance in updates, mitigated with sufficient training episodes