Double Deep Q-Network (Double DQN) — LunarLander-v3
This repository contains a trained Double Deep Q-Network (Double DQN) agent for the Gymnasium environment LunarLander-v3.
Environment
- Environment: LunarLander-v3
- State Space: 8-dimensional continuous vector
- Action Space: 4 discrete thruster actions
- Goal: Land spacecraft safely between flags
LunarLander is significantly more complex than CartPole:
- Higher dimensional state
- Complex reward shaping
- High variance returns
Why Standard DQN Is Not Enough
Standard DQN uses:
y = r + γ max_a' Q(s',a'; θ⁻)
The max operator introduces overestimation bias:
E[max(X)] ≥ max(E[X])
This bias becomes significant in complex environments.
Double DQN Solution
Double DQN separates:
- Action selection (policy network)
- Action evaluation (target network)
Target becomes:
y = r + γ Q(s', argmax_a Q(s',a; θ), θ⁻)
This reduces overestimation and improves stability.
Training Details
- Learning rate: 1e-4
- Discount factor: 0.99
- Batch size: 64
- Replay buffer: 50,000
- Target update frequency: 500 steps
- Episodes: 2000
Lower learning rate improves stability in complex environments.
Performance
The agent learns to perform controlled landings and achieve high average rewards.
Double DQN provides more stable convergence compared to standard DQN.
Visualization
Below is the trained Double DQN agent landing:
Files
lunarlander_ddqn.pt→ Trained PyTorch modellunarlander_ddqn.gif→ Agent demonstration
Summary
This project demonstrates:
- Overestimation bias in DQN
- Double DQN improvement
- Stability scaling in deep RL
- Advanced reinforcement learning implementation
It represents a progression toward modern deep reinforcement learning techniques.
