Q-Learning Agent playing FrozenLake-v1 ⛄

This is a trained Q-Learning agent playing FrozenLake-v1 (4x4 map, non-slippery version). The agent was trained using a custom implementation of the Q-Learning algorithm.

🎮 Environment

Environment: FrozenLake-v1
Map Name: 4x4
Is Slippery: False (Deterministic)

📊 Evaluation Results

Metric	Value
Mean Reward	1.00 +/- 0.00
Evaluation Episodes	100

⚙️ Hyperparameters

The agent was trained using the following hyperparameters:

Total Training Episodes: 10,000
Learning Rate: 0.7
Gamma (Discount Factor): 0.95
Max Steps per Episode: 99
Epsilon (Exploration) Start: 1.0
Epsilon (Exploration) Min: 0.05
Decay Rate: 0.0005

🐍 Usage

To use this model, you need gymnasium and pickle5 installed. You can load the model and evaluate it using the code below:

import gymnasium as gym
import pickle5 as pickle
import numpy as np
from huggingface_hub import hf_hub_download

# 1. Download the model file from the Hub
repo_id = "Tejas-Anvekar/q-FrozenLake-v1-4x4-noSlippery"
filename = "q-learning.pkl"

pickle_model = hf_hub_download(repo_id=repo_id, filename=filename)

# 2. Load the model configuration and Q-table
with open(pickle_model, 'rb') as f:
    model = pickle.load(f)

# 3. Create the environment
# IMPORTANT: Ensure is_slippery is set to False to match the training configuration
env = gym.make(model["env_id"], map_name="4x4", is_slippery=False, render_mode="rgb_array")

# 4. Define the Greedy Policy
def greedy_policy(Qtable, state):
    action = np.argmax(Qtable[state][:])
    return action

# 5. Evaluate the agent
state, info = env.reset()
terminated = False
truncated = False
total_reward = 0

print("Agent is playing...")
while not terminated and not truncated:
    action = greedy_policy(model["qtable"], state)
    next_state, reward, terminated, truncated, info = env.step(action)
    total_reward += reward
    state = next_state

print(f"Game Finished! Total Reward: {total_reward}")
env.close()

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning

Evaluation results

mean_reward on FrozenLake-v1-4x4-no_slippery
self-reported

1.00 +/- 0.00