Deep Q-Network (DQN) β CartPole-v1
This repository contains a trained Deep Q-Network (DQN) agent for the Gymnasium environment CartPole-v1.
Environment
- Environment: CartPole-v1
- State Space: 4-dimensional continuous vector
- Action Space: 2 discrete actions
- Goal: Balance the pole for as long as possible
CartPole has a continuous state space, making tabular Q-learning infeasible.
Algorithm β Deep Q-Network (DQN)
DQN approximates Q-values using a neural network:
Q(s,a; ΞΈ)
Training target:
y = r + Ξ³ max_a' Q(s',a'; ΞΈβ»)
Key components:
- Policy Network
- Target Network
- Experience Replay Buffer
- MSE loss optimization
Why DQN?
Because CartPole has continuous states:
s β ββ΄
A Q-table cannot represent infinite possible states.
A neural network is used for function approximation.
Training Details
- Learning rate: 1e-3
- Discount factor: 0.99
- Batch size: 64
- Target update frequency: 100 steps
- Episodes: 500
Replay buffer improves stability and data efficiency.
Performance
The agent learns to consistently balance the pole for long durations.
Training reward improves steadily over episodes.
Visualization
Below is the trained DQN agent:
Files
cartpole_dqn.ptβ Trained PyTorch modelcartpole_dqn.gifβ Agent demonstration
Summary
This project demonstrates:
- Function approximation in reinforcement learning
- Experience replay
- Target networks for stability
- Deep reinforcement learning fundamentals
It represents a transition from tabular RL to deep RL.
