Q-Learning Agent — FrozenLake-v1

This repository contains a trained Q-Learning agent for the Gymnasium environment FrozenLake-v1.


Environment

  • Environment: FrozenLake-v1
  • State Space: 16 discrete states
  • Action Space: 4 discrete actions
  • Type: Stochastic grid-world (slippery surface)

FrozenLake is a small Markov Decision Process (MDP) where the agent must reach a goal while avoiding holes.


Algorithm

This model uses Tabular Q-Learning, a model-free off-policy reinforcement learning algorithm.

Update rule:

Q(s,a) ← Q(s,a) + α [ r + γ max_a' Q(s',a') − Q(s,a) ]

Where:

  • α = learning rate
  • γ = discount factor

Because the environment is discrete and small, Q-values are stored in a Q-table of shape (16 × 4).


Training Details

  • Learning rate (α): 0.1
  • Discount factor (γ): 0.99
  • Episodes: 5000
  • Epsilon-greedy exploration with decay

The agent learns to maximize expected long-term reward despite stochastic transitions.


Performance

Training reward was tracked across episodes.

The agent successfully learns an optimal navigation policy to reach the goal.


Visualization

Below is the trained agent interacting with the environment:

FrozenLake Agent


Files

  • frozenlake_q_table.npy → Trained Q-table
  • frozenlake_trained_agent.gif → Agent demonstration

Summary

This project demonstrates:

  • Tabular reinforcement learning
  • Bellman optimality updates
  • Exploration vs exploitation trade-off
  • Convergence in finite MDPs

It serves as a foundational reinforcement learning example.

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading