Q-Learning Agent — FrozenLake-v1
This repository contains a trained Q-Learning agent for the Gymnasium environment FrozenLake-v1.
Environment
- Environment: FrozenLake-v1
- State Space: 16 discrete states
- Action Space: 4 discrete actions
- Type: Stochastic grid-world (slippery surface)
FrozenLake is a small Markov Decision Process (MDP) where the agent must reach a goal while avoiding holes.
Algorithm
This model uses Tabular Q-Learning, a model-free off-policy reinforcement learning algorithm.
Update rule:
Q(s,a) ← Q(s,a) + α [ r + γ max_a' Q(s',a') − Q(s,a) ]
Where:
- α = learning rate
- γ = discount factor
Because the environment is discrete and small, Q-values are stored in a Q-table of shape (16 × 4).
Training Details
- Learning rate (α): 0.1
- Discount factor (γ): 0.99
- Episodes: 5000
- Epsilon-greedy exploration with decay
The agent learns to maximize expected long-term reward despite stochastic transitions.
Performance
Training reward was tracked across episodes.
The agent successfully learns an optimal navigation policy to reach the goal.
Visualization
Below is the trained agent interacting with the environment:
Files
frozenlake_q_table.npy→ Trained Q-tablefrozenlake_trained_agent.gif→ Agent demonstration
Summary
This project demonstrates:
- Tabular reinforcement learning
- Bellman optimality updates
- Exploration vs exploitation trade-off
- Convergence in finite MDPs
It serves as a foundational reinforcement learning example.
