🕹️ PPO Agent on ViZDoom Doom Agent

This repository contains a trained Proximal Policy Optimization (PPO) agent that plays the ViZDoom Doom environment, built using the ViZDoom framework (vizdoom.cs.put.edu.pl) (vizdoom.cs.put.edu.pl in Bing) (bing.com in Bing) and integrated with reinforcement learning libraries.

📊 Model Card

Model Name: ppo-ViZDoom-DoomAgent
Environment: ViZDoom Doom
Algorithm: PPO (Proximal Policy Optimization)
Performance Metric:

Learns to shoot moving enemies in the Doom scenario
Demonstrates convergence to a stable combat policy

🚀 Usage

from huggingface_hub import hf_hub_download
import gym

# Load the trained PPO model
model_path = hf_hub_download(
    repo_id="KraTUZen/Vizdoom-Doom-Agent",
    filename="model.pkl"
)

# Initialize environment
env = gym.make("Vizdoom-v0")

🧠 Notes

The agent is trained using PPO, an on‑policy algorithm effective in visual, high‑dimensional tasks.
The environment is ViZDoom Doom, where the agent must shoot enemies while navigating an arena.
The serialized policy is stored in model.pkl.

📂 Repository Structure

model.pkl → Trained policy weights
README.md → Documentation and usage guide
configs/ → Training configuration files
scripts/ → Training and evaluation scripts

✅ Results

The agent learns to aim and shoot effectively.
Shows stable convergence using PPO, balancing exploration and exploitation.

🔎 Environment Overview

Observation Space: Visual pixel‑based input (first‑person view)
Action Space: Discrete (move, turn, shoot)
Objective: Eliminate enemies efficiently
Reward: Positive reward for hitting enemies, penalties for missed shots or damage

📚 Learning Highlights

Algorithm: PPO (Proximal Policy Optimization)
Update Rule: Clipped surrogate objective ensures stable updates
Strengths: Handles pixel‑based visual input and sparse rewards
Limitations: Training requires significant compute due to visual complexity

Downloads last month: -

Video Preview

Reinforcement Learning

Evaluation results

mean_reward on doom_health_gathering_supreme
self-reported

5.80 +/- 0.46