Double Deep Q-Network (Double DQN) — LunarLander-v3

This repository contains a trained Double Deep Q-Network (Double DQN) agent for the Gymnasium environment LunarLander-v3.

Environment

LunarLander is significantly more complex than CartPole:

Standard DQN uses:

y = r + γ max_a' Q(s',a'; θ⁻)

The max operator introduces overestimation bias:

E[max(X)] ≥ max(E[X])

This bias becomes significant in complex environments.

Double DQN separates:

Target becomes:

y = r + γ Q(s', argmax_a Q(s',a; θ), θ⁻)

This reduces overestimation and improves stability.

Lower learning rate improves stability in complex environments.

The agent learns to perform controlled landings and achieve high average rewards.

Double DQN provides more stable convergence compared to standard DQN.

Below is the trained Double DQN agent landing:

This project demonstrates:

It represents a progression toward modern deep reinforcement learning techniques.

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview