Double Deep Q-Network (Double DQN) — LunarLander-v3

This repository contains a trained Double Deep Q-Network (Double DQN) agent for the Gymnasium environment LunarLander-v3.


Environment

  • Environment: LunarLander-v3
  • State Space: 8-dimensional continuous vector
  • Action Space: 4 discrete thruster actions
  • Goal: Land spacecraft safely between flags

LunarLander is significantly more complex than CartPole:

  • Higher dimensional state
  • Complex reward shaping
  • High variance returns

Why Standard DQN Is Not Enough

Standard DQN uses:

y = r + γ max_a' Q(s',a'; θ⁻)

The max operator introduces overestimation bias:

E[max(X)] ≥ max(E[X])

This bias becomes significant in complex environments.


Double DQN Solution

Double DQN separates:

  1. Action selection (policy network)
  2. Action evaluation (target network)

Target becomes:

y = r + γ Q(s', argmax_a Q(s',a; θ), θ⁻)

This reduces overestimation and improves stability.


Training Details

  • Learning rate: 1e-4
  • Discount factor: 0.99
  • Batch size: 64
  • Replay buffer: 50,000
  • Target update frequency: 500 steps
  • Episodes: 2000

Lower learning rate improves stability in complex environments.


Performance

The agent learns to perform controlled landings and achieve high average rewards.

Double DQN provides more stable convergence compared to standard DQN.


Visualization

Below is the trained Double DQN agent landing:

LunarLander DDQN


Files

  • lunarlander_ddqn.pt → Trained PyTorch model
  • lunarlander_ddqn.gif → Agent demonstration

Summary

This project demonstrates:

  • Overestimation bias in DQN
  • Double DQN improvement
  • Stability scaling in deep RL
  • Advanced reinforcement learning implementation

It represents a progression toward modern deep reinforcement learning techniques.

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading