πΉοΈ PPO Agent on ViZDoom Doom Agent
This repository contains a trained Proximal Policy Optimization (PPO) agent that plays the ViZDoom Doom environment, built using the ViZDoom framework (vizdoom.cs.put.edu.pl) (vizdoom.cs.put.edu.pl in Bing) (bing.com in Bing) and integrated with reinforcement learning libraries.
π Model Card
Model Name: ppo-ViZDoom-DoomAgent
Environment: ViZDoom Doom
Algorithm: PPO (Proximal Policy Optimization)
Performance Metric:
- Learns to shoot moving enemies in the Doom scenario
- Demonstrates convergence to a stable combat policy
π Usage
from huggingface_hub import hf_hub_download
import gym
# Load the trained PPO model
model_path = hf_hub_download(
repo_id="KraTUZen/Vizdoom-Doom-Agent",
filename="model.pkl"
)
# Initialize environment
env = gym.make("Vizdoom-v0")
π§ Notes
- The agent is trained using PPO, an onβpolicy algorithm effective in visual, highβdimensional tasks.
- The environment is ViZDoom Doom, where the agent must shoot enemies while navigating an arena.
- The serialized policy is stored in
model.pkl.
π Repository Structure
model.pklβ Trained policy weightsREADME.mdβ Documentation and usage guideconfigs/β Training configuration filesscripts/β Training and evaluation scripts
β Results
- The agent learns to aim and shoot effectively.
- Shows stable convergence using PPO, balancing exploration and exploitation.
π Environment Overview
- Observation Space: Visual pixelβbased input (firstβperson view)
- Action Space: Discrete (move, turn, shoot)
- Objective: Eliminate enemies efficiently
- Reward: Positive reward for hitting enemies, penalties for missed shots or damage
π Learning Highlights
- Algorithm: PPO (Proximal Policy Optimization)
- Update Rule: Clipped surrogate objective ensures stable updates
- Strengths: Handles pixelβbased visual input and sparse rewards
- Limitations: Training requires significant compute due to visual complexity
- Downloads last month
- -
Evaluation results
- mean_reward on doom_health_gathering_supremeself-reported5.80 +/- 0.46