πŸ•ΉοΈ PPO Agent on ViZDoom Doom Agent

This repository contains a trained Proximal Policy Optimization (PPO) agent that plays the ViZDoom Doom environment, built using the ViZDoom framework (vizdoom.cs.put.edu.pl) (vizdoom.cs.put.edu.pl in Bing) (bing.com in Bing) and integrated with reinforcement learning libraries.


πŸ“Š Model Card

Model Name: ppo-ViZDoom-DoomAgent
Environment: ViZDoom Doom
Algorithm: PPO (Proximal Policy Optimization)
Performance Metric:

  • Learns to shoot moving enemies in the Doom scenario
  • Demonstrates convergence to a stable combat policy

πŸš€ Usage

from huggingface_hub import hf_hub_download
import gym

# Load the trained PPO model
model_path = hf_hub_download(
    repo_id="KraTUZen/Vizdoom-Doom-Agent",
    filename="model.pkl"
)

# Initialize environment
env = gym.make("Vizdoom-v0")

🧠 Notes

  • The agent is trained using PPO, an on‑policy algorithm effective in visual, high‑dimensional tasks.
  • The environment is ViZDoom Doom, where the agent must shoot enemies while navigating an arena.
  • The serialized policy is stored in model.pkl.

πŸ“‚ Repository Structure

  • model.pkl β†’ Trained policy weights
  • README.md β†’ Documentation and usage guide
  • configs/ β†’ Training configuration files
  • scripts/ β†’ Training and evaluation scripts

βœ… Results

  • The agent learns to aim and shoot effectively.
  • Shows stable convergence using PPO, balancing exploration and exploitation.

πŸ”Ž Environment Overview

  • Observation Space: Visual pixel‑based input (first‑person view)
  • Action Space: Discrete (move, turn, shoot)
  • Objective: Eliminate enemies efficiently
  • Reward: Positive reward for hitting enemies, penalties for missed shots or damage

πŸ“š Learning Highlights

  • Algorithm: PPO (Proximal Policy Optimization)
  • Update Rule: Clipped surrogate objective ensures stable updates
  • Strengths: Handles pixel‑based visual input and sparse rewards
  • Limitations: Training requires significant compute due to visual complexity
Downloads last month
-
Video Preview
loading

Evaluation results

  • mean_reward on doom_health_gathering_supreme
    self-reported
    5.80 +/- 0.46