""" Introduction & Quick Start ========================== **Part 1 of 5** in the OpenEnv Getting Started Series This notebook introduces OpenEnv, explains why it exists, and gets you running your first environment. .. note:: **Time**: ~10 minutes | **Difficulty**: Beginner | **GPU Required**: No What You'll Learn ----------------- - **What is OpenEnv**: The unified framework for RL environments - **Why OpenEnv**: How it compares to traditional solutions like Gym - **RL Basics**: The observe-act-reward loop in 60 seconds - **Quick Start**: Connect to and interact with your first environment """ # %% # Setup: Enable nested async event loops # -------------------------------------- # # This is needed when running in environments like Sphinx-Gallery or Jupyter # that already have an event loop running. import nest_asyncio nest_asyncio.apply() # %% # What is OpenEnv? # ---------------- # # OpenEnv is a **unified framework for building, sharing, and interacting with # reinforcement learning environments**. It's a collaborative effort between # Meta, Hugging Face, Unsloth, GPU Mode, and other industry leaders. # # **The Goal**: Make environment creation as easy and standardized as model # sharing on Hugging Face. # # Key Features # ~~~~~~~~~~~~ # # - **Standardized API**: Gymnasium-style ``reset()``, ``step()``, ``state()`` # - **Type-Safe**: Full IDE autocomplete and error checking # - **Containerized**: Environments run in Docker for isolation and reproducibility # - **Shareable**: Push to Hugging Face Hub with one command # - **Language-Agnostic**: HTTP/WebSocket API works from any language # %% # RL in 60 Seconds # ---------------- # # Reinforcement Learning is simpler than you think. It's just a loop: # # .. code-block:: text # # ┌─────────────────────────────────────────────────────────────┐ # │ THE RL LOOP │ # │ │ # │ ┌─────────┐ ┌─────────────┐ │ # │ │ AGENT │─action─▶│ ENVIRONMENT │ │ # │ │ │◀─reward─│ │ │ # │ │ │◀──obs───│ │ │ # │ └─────────┘ └─────────────┘ │ # │ │ # │ 1. Agent observes the environment │ # │ 2. Agent chooses an action │ # │ 3. Environment returns reward + new observation │ # │ 4. Repeat until done │ # └─────────────────────────────────────────────────────────────┘ # # In code, it looks like this: # # .. code-block:: python # # result = env.reset() # Start episode # while not result.done: # action = agent.choose(result.observation) # result = env.step(action) # Take action, get reward # agent.learn(result.reward) # # That's it. That's RL! # %% # Why OpenEnv? (vs. Traditional Solutions) # ---------------------------------------- # # Traditional RL environments (like OpenAI Gym/Gymnasium) have been the backbone # of RL research for years. They provide a simple API for interacting with # environments, and the community has built thousands of environments on top of them. # # However, as RL moves from research to production, several challenges emerge: # # The Problem with Traditional Approaches # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # # 1. **No Type Safety**: Observations are numpy arrays like ``obs[0][3]``. What does # index 3 mean? You have to read documentation or source code to find out. # # 2. **Same-Process Execution**: The environment runs in your training process. # A bug in the environment can crash your entire training run. # # 3. **Dependency Hell**: Sharing environments means copying files and hoping # the recipient has the same dependencies installed. # # 4. **Python Lock-in**: Want to use Rust or C++ for your agent? Too bad—Gym is Python-only. # # 5. **"Works on My Machine"**: Environments behave differently on different systems # due to floating-point differences, library versions, or OS quirks. # # How OpenEnv Solves These Problems # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # # +------------------+----------------------------------+----------------------------------+ # | Challenge | Traditional (Gym) | OpenEnv | # +==================+==================================+==================================+ # | **Type Safety** | ``obs[0][3]`` - what is it? | ``obs.info_state`` - IDE knows! | # +------------------+----------------------------------+----------------------------------+ # | **Isolation** | Same process (can crash) | Docker container (isolated) | # +------------------+----------------------------------+----------------------------------+ # | **Deployment** | "Works on my machine" | Same container everywhere | # +------------------+----------------------------------+----------------------------------+ # | **Sharing** | Copy files, manage deps | ``openenv push`` to Hub | # +------------------+----------------------------------+----------------------------------+ # | **Language** | Python only | Any language (HTTP/WebSocket) | # +------------------+----------------------------------+----------------------------------+ # | **Scaling** | Single machine | Deploy to Kubernetes | # +------------------+----------------------------------+----------------------------------+ # | **Debugging** | Cryptic numpy index errors | Clear, typed error messages | # +------------------+----------------------------------+----------------------------------+ # # Side-by-Side Code Comparison # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # # Let's compare the same workflow in both approaches: # # **Traditional Gym approach:** # # .. code-block:: python # # import gym # import numpy as np # # # Create environment - runs in your process # env = gym.make("CartPole-v1") # # # Reset returns numpy arrays # obs, info = env.reset() # # obs = array([0.01, 0.02, -0.03, 0.01]) # # What do these numbers mean? You have to check docs! # # # Step returns multiple values # obs, reward, done, truncated, info = env.step(action) # # No IDE autocomplete, easy to mix up return values # # # If env crashes, your whole training crashes # # Sharing requires: pip install gym[atari], hope versions match # # **OpenEnv approach:** # # .. code-block:: python # # from openenv import AutoEnv, AutoAction # # # Load environment and action classes via auto-discovery # OpenSpielEnv = AutoEnv.get_env_class("openspiel") # OpenSpielAction = AutoAction.from_env("openspiel") # # # Connect to containerized environment # with OpenSpielEnv(base_url="http://localhost:8000") as env: # # Reset returns typed StepResult # result = env.reset() # # result.observation.legal_actions - IDE autocompletes! # # result.observation.info_state - you know exactly what this is # # # Step with typed action # action = OpenSpielAction(action_id=1, game_name="catch") # result = env.step(action) # # result.reward, result.done - all typed # # # Environment runs in Docker - isolated from your code # # Share via: openenv push my-env (one command!) # %% # Part 1: Environment Setup # ------------------------- # # Let's set up our environment. This works in Google Colab, locally, or # anywhere Python runs. import subprocess import sys from pathlib import Path # Detect environment try: import google.colab IN_COLAB = True except ImportError: IN_COLAB = False if IN_COLAB: print("=" * 70) print(" GOOGLE COLAB DETECTED - Installing OpenEnv...") print("=" * 70) # Install OpenEnv subprocess.run( [sys.executable, "-m", "pip", "install", "-q", "openenv-core"], capture_output=True, ) print(" OpenEnv installed!") print("=" * 70) else: print("=" * 70) print(" RUNNING LOCALLY") print("=" * 70) print() print("If you haven't installed OpenEnv yet:") print(" pip install openenv-core") print() # Add src to path for local development (when running from docs folder) src_path = Path.cwd().parent.parent.parent / "src" if src_path.exists(): sys.path.insert(0, str(src_path)) # Add envs to path envs_path = Path.cwd().parent.parent.parent / "envs" if envs_path.exists(): sys.path.insert(0, str(envs_path.parent)) print("=" * 70) print() print("Ready to explore OpenEnv!") # %% # Part 2: Your First Environment - OpenSpiel # ------------------------------------------- # # What is OpenSpiel? # ~~~~~~~~~~~~~~~~~~ # # `OpenSpiel `_ is an open-source # collection of **70+ game environments** developed by DeepMind for research in # reinforcement learning, game theory, and multi-agent systems. # # It includes: # # - **Classic board games**: Chess, Go, Backgammon, Tic-Tac-Toe # - **Card games**: Poker variants, Blackjack, Bridge # - **Simple RL benchmarks**: Catch, Cliff Walking, 2048 # - **Multi-agent games**: Hanabi, Kuhn Poker, Negotiation games # # OpenSpiel is widely used in RL research because it provides consistent, # well-tested implementations with support for both single-player and multi-player # scenarios. # # How OpenSpiel Connects to OpenEnv # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # # OpenEnv wraps OpenSpiel games as **containerized, type-safe environments**. # This means: # # 1. You get all the benefits of OpenSpiel's game library # 2. Plus type-safe Python clients with IDE autocomplete # 3. Plus Docker isolation for reproducibility # 4. Plus easy sharing via Hugging Face Hub # # Currently, OpenEnv includes wrappers for 6 OpenSpiel games: # # +------------------+-------------+------------------------------------------+ # | Game | Players | Description | # +==================+=============+==========================================+ # | **Catch** | 1 | Catch a falling ball with a paddle | # +------------------+-------------+------------------------------------------+ # | **2048** | 1 | Slide tiles to combine numbers | # +------------------+-------------+------------------------------------------+ # | **Blackjack** | 1 | Classic card game against dealer | # +------------------+-------------+------------------------------------------+ # | **Cliff Walking**| 1 | Navigate a grid while avoiding cliffs | # +------------------+-------------+------------------------------------------+ # | **Tic-Tac-Toe** | 2 | Classic 3×3 grid game | # +------------------+-------------+------------------------------------------+ # | **Kuhn Poker** | 2 | Simplified 3-card poker | # +------------------+-------------+------------------------------------------+ # # The Catch Game # ~~~~~~~~~~~~~~ # # For this tutorial, we'll use **Catch**—one of the simplest RL environments. # It's perfect for learning because: # # - Simple rules (easy to understand) # - Fast episodes (10 steps each) # - Clear success metric (did you catch the ball?) # - Optimal strategy is learnable (move toward the ball) # # **Game Rules:** # # .. code-block:: text # # ⬜ ⬜ 🔴 ⬜ ⬜ <- Ball starts at random column (row 0) # ⬜ ⬜ ⬜ ⬜ ⬜ # ⬜ ⬜ ⬜ ⬜ ⬜ The ball falls down one row # ⬜ ⬜ ⬜ ⬜ ⬜ each time step # ⬜ ⬜ ⬜ ⬜ ⬜ # ⬜ ⬜ ⬜ ⬜ ⬜ # ⬜ ⬜ ⬜ ⬜ ⬜ # ⬜ ⬜ ⬜ ⬜ ⬜ # ⬜ ⬜ ⬜ ⬜ ⬜ # ⬜ ⬜ 🏓 ⬜ ⬜ <- Paddle at bottom (row 9) # # - **Grid Size**: 10 rows × 5 columns # - **Ball**: Starts at a random column in row 0, falls one row per step # - **Paddle**: Starts at center column, you control it # - **Episode Length**: 10 steps (ball reaches bottom) # # **Actions:** # # +------------+------------------+ # | Action ID | Movement | # +============+==================+ # | 0 | Move LEFT | # +------------+------------------+ # | 1 | STAY (no move) | # +------------+------------------+ # | 2 | Move RIGHT | # +------------+------------------+ # # **Rewards:** # # - **+1.0** if the paddle is in the same column as the ball when it lands # - **0.0** if you miss the ball # # **Optimal Strategy**: Track the ball's column and move toward it. A perfect # policy wins 100% of the time since the paddle can always reach any column # in 10 steps (grid is only 5 columns wide). # # Importing OpenEnv # ~~~~~~~~~~~~~~~~~ # # First, let's import the OpenSpiel environment client and models: # Real imports from OpenEnv try: # Direct imports from the openspiel_env package from openspiel_env.client import OpenSpielEnv from openspiel_env.models import OpenSpielAction, OpenSpielObservation, OpenSpielState OPENENV_AVAILABLE = True print("✓ OpenEnv imports successful!") print(f" - OpenSpielEnv: {OpenSpielEnv}") print(f" - OpenSpielAction: {OpenSpielAction}") except ImportError as e: OPENENV_AVAILABLE = False print(f"✗ OpenEnv not fully installed: {e}") print(" Run: pip install openenv-core") print(" And: pip install -e ./envs/openspiel_env") # %% # Connecting to an Environment # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # # OpenEnv provides three ways to connect to environments: # # 1. **From Hugging Face Hub** (auto-downloads and starts container) # 2. **From Docker image** (uses local image) # 3. **From URL** (connects to running server) # # Let's examine the actual methods available on the client class: print("=" * 70) print(" THREE WAYS TO CONNECT") print("=" * 70) print() if OPENENV_AVAILABLE: # Show actual method signatures from the class import inspect print("Connection methods available on OpenSpielEnv:") print() # Method 1: from_hub if hasattr(OpenSpielEnv, "from_hub"): sig = inspect.signature(OpenSpielEnv.from_hub) print(f"1. OpenSpielEnv.from_hub{sig}") print(" → Auto-downloads from Hugging Face, starts container, connects") print(" Example: env = OpenSpielEnv.from_hub('openenv/openspiel-env')") print() # Method 2: from_docker_image if hasattr(OpenSpielEnv, "from_docker_image"): sig = inspect.signature(OpenSpielEnv.from_docker_image) print(f"2. OpenSpielEnv.from_docker_image{sig}") print(" → Starts container from local image, connects") print(" Example: env = OpenSpielEnv.from_docker_image('openspiel-env:latest')") print() # Method 3: Direct connection sig = inspect.signature(OpenSpielEnv.__init__) print(f"3. OpenSpielEnv.__init__{sig}") print(" → Connects to already-running server") print(" Example: env = OpenSpielEnv(base_url='http://localhost:8000')") print() print("-" * 70) print("All three give you the same API - just different ways to start!") else: print("(OpenEnv not installed - showing expected methods)") print() print("1. OpenSpielEnv.from_hub(repo_id, *, use_docker=True, ...)") print(" → Auto-downloads from Hugging Face, starts container, connects") print() print("2. OpenSpielEnv.from_docker_image(image, provider=None, ...)") print(" → Starts container from local image, connects") print() print("3. OpenSpielEnv(base_url, connect_timeout_s=10.0, ...)") print(" → Connects to already-running server") # %% # Part 3: Playing the Catch Game # ------------------------------ # # Now let's actually play! This code attempts to connect to a real server. # If no server is running, we'll show what the interaction looks like. import random # Check if we can connect to a server SERVER_URL = "http://localhost:8000" SERVER_AVAILABLE = False if OPENENV_AVAILABLE: try: # Try to connect using sync wrapper env = OpenSpielEnv(base_url=SERVER_URL) with env.sync() as client: # Quick test to verify connection pass SERVER_AVAILABLE = True print(f"✓ Connected to server at {SERVER_URL}") except Exception as e: print(f"✗ No server running at {SERVER_URL}") print(f" Error: {e}") print() print("To start a server, run one of these:") print(" docker run -p 8000:8000 openenv/openspiel-env:latest") print(" # OR") print(" cd envs/openspiel_env && openenv serve") # %% # Playing with a Real Server # ~~~~~~~~~~~~~~~~~~~~~~~~~~ # # When connected to a real server, here's how the interaction works: if OPENENV_AVAILABLE and SERVER_AVAILABLE: print("=" * 70) print(" PLAYING CATCH - LIVE!") print("=" * 70) env = OpenSpielEnv(base_url=SERVER_URL) with env.sync() as client: # Reset to start a new episode result = client.reset() print(f"\nEpisode started!") print(f" Observation type: {type(result.observation).__name__}") print(f" Legal actions: {result.observation.legal_actions}") print(f" Done: {result.done}") # Play until the episode ends step_count = 0 while not result.done: # Choose a random action from legal actions action_id = random.choice(result.observation.legal_actions) action = OpenSpielAction(action_id=action_id, game_name="catch") # Take the action result = client.step(action) step_count += 1 print(f"\nStep {step_count}:") print(f" Action: {action_id} ({'LEFT' if action_id == 0 else 'STAY' if action_id == 1 else 'RIGHT'})") print(f" Reward: {result.reward}") print(f" Done: {result.done}") # Get final state state = client.state() print(f"\nEpisode complete!") print(f" Total steps: {state.step_count}") print(f" Final reward: {result.reward}") print(f" Result: {'CAUGHT!' if result.reward > 0 else 'MISSED!'}") else: # Run a local simulation to demonstrate the gameplay print("=" * 70) print(" PLAYING CATCH - LOCAL SIMULATION") print("=" * 70) print() print("No server running - demonstrating with local simulation.") print("(This shows exactly what happens when playing the real game)") print() # Simulate the Catch game locally GRID_HEIGHT = 10 GRID_WIDTH = 5 # Initialize game state ball_col = random.randint(0, GRID_WIDTH - 1) paddle_col = GRID_WIDTH // 2 # Start in center print(f"Game initialized:") print(f" Ball starting column: {ball_col}") print(f" Paddle starting column: {paddle_col}") print(f" Grid size: {GRID_HEIGHT} rows × {GRID_WIDTH} columns") print() # Simulate episode for step in range(GRID_HEIGHT): # Create observation (matching OpenSpiel format) info_state = [0.0] * (GRID_HEIGHT * GRID_WIDTH) info_state[step * GRID_WIDTH + ball_col] = 1.0 # Ball position info_state[(GRID_HEIGHT - 1) * GRID_WIDTH + paddle_col] = 1.0 # Paddle legal_actions = [0, 1, 2] # LEFT, STAY, RIGHT # Choose random action action_id = random.choice(legal_actions) action_name = {0: "LEFT", 1: "STAY", 2: "RIGHT"}[action_id] # Execute action old_paddle = paddle_col if action_id == 0: # LEFT paddle_col = max(0, paddle_col - 1) elif action_id == 2: # RIGHT paddle_col = min(GRID_WIDTH - 1, paddle_col + 1) print(f"Step {step + 1}: Ball at row {step}, col {ball_col} | " f"Paddle: {old_paddle}→{paddle_col} ({action_name})") # Determine result caught = (paddle_col == ball_col) reward = 1.0 if caught else 0.0 print() print(f"Episode complete!") print(f" Ball landed at column: {ball_col}") print(f" Paddle final column: {paddle_col}") print(f" Reward: {reward}") print(f" Result: {'CAUGHT! 🎉' if caught else 'MISSED! 😢'}") print() print("-" * 70) print("This is exactly how the real OpenSpielEnv works,") print("just running locally instead of via WebSocket to a server.") # %% # Part 4: Understanding the Response Types # ---------------------------------------- # # OpenEnv uses type-safe models for all interactions. Let's create actual # instances and examine their attributes: print("=" * 70) print(" OPENENV TYPE SYSTEM - ACTUAL INSTANCES") print("=" * 70) # Create example instances that match what you'd get from the Catch game # These are the actual Pydantic models used by OpenEnv # 1. OpenSpielObservation - what the agent receives after each step print("\n📦 OpenSpielObservation (returned in StepResult)") print("-" * 50) if OPENENV_AVAILABLE: # OpenSpielObservation was already imported above via auto-discovery # Create a sample observation like what Catch game returns sample_observation = OpenSpielObservation( info_state=[0.0, 0.0, 1.0, 0.0, 0.0] + [0.0] * 45, # Ball at col 2, row 0 legal_actions=[0, 1, 2], # LEFT, STAY, RIGHT game_phase="playing", current_player_id=0, opponent_last_action=None, ) print(f" info_state: {sample_observation.info_state[:10]}... (length: {len(sample_observation.info_state)})") print(f" legal_actions: {sample_observation.legal_actions}") print(f" game_phase: {sample_observation.game_phase!r}") print(f" current_player_id: {sample_observation.current_player_id}") print(f" opponent_last_action: {sample_observation.opponent_last_action}") else: # Create without imports to show the structure from dataclasses import dataclass from typing import List, Optional @dataclass class OpenSpielObservation: info_state: List[float] legal_actions: List[int] game_phase: str = "playing" current_player_id: int = 0 opponent_last_action: Optional[int] = None sample_observation = OpenSpielObservation( info_state=[0.0, 0.0, 1.0, 0.0, 0.0] + [0.0] * 45, legal_actions=[0, 1, 2], game_phase="playing", current_player_id=0, opponent_last_action=None, ) print(f" info_state: {sample_observation.info_state[:10]}... (length: {len(sample_observation.info_state)})") print(f" legal_actions: {sample_observation.legal_actions}") print(f" game_phase: {sample_observation.game_phase!r}") print(f" current_player_id: {sample_observation.current_player_id}") print(f" opponent_last_action: {sample_observation.opponent_last_action}") # 2. OpenSpielState - the environment's internal state print("\n📊 OpenSpielState (returned by state())") print("-" * 50) if OPENENV_AVAILABLE: # OpenSpielState was already imported above via auto-discovery sample_state = OpenSpielState( game_name="catch", agent_player=0, opponent_policy="random", game_params={"rows": 10, "columns": 5}, num_players=1, ) print(f" game_name: {sample_state.game_name!r}") print(f" agent_player: {sample_state.agent_player}") print(f" opponent_policy: {sample_state.opponent_policy!r}") print(f" game_params: {sample_state.game_params}") print(f" num_players: {sample_state.num_players}") else: @dataclass class OpenSpielState: game_name: str = "catch" agent_player: int = 0 opponent_policy: str = "random" game_params: dict = None num_players: int = 1 sample_state = OpenSpielState( game_name="catch", agent_player=0, opponent_policy="random", game_params={"rows": 10, "columns": 5}, num_players=1, ) print(f" game_name: {sample_state.game_name!r}") print(f" agent_player: {sample_state.agent_player}") print(f" opponent_policy: {sample_state.opponent_policy!r}") print(f" game_params: {sample_state.game_params}") print(f" num_players: {sample_state.num_players}") # 3. OpenSpielAction - what you send to step() print("\n🎮 OpenSpielAction (what you send to step())") print("-" * 50) if OPENENV_AVAILABLE: # OpenSpielAction was already imported above via auto-discovery sample_action = OpenSpielAction( action_id=1, # STAY game_name="catch", game_params={"rows": 10, "columns": 5}, ) print(f" action_id: {sample_action.action_id} # 0=LEFT, 1=STAY, 2=RIGHT") print(f" game_name: {sample_action.game_name!r}") print(f" game_params: {sample_action.game_params}") else: @dataclass class OpenSpielAction: action_id: int game_name: str = "catch" game_params: dict = None sample_action = OpenSpielAction( action_id=1, game_name="catch", game_params={"rows": 10, "columns": 5}, ) print(f" action_id: {sample_action.action_id} # 0=LEFT, 1=STAY, 2=RIGHT") print(f" game_name: {sample_action.game_name!r}") print(f" game_params: {sample_action.game_params}") print("\n" + "=" * 70) print("These are the actual Pydantic/dataclass models used by OpenEnv.") print("Type safety helps catch errors before they reach the environment!") print("=" * 70) # %% # Part 5: The Architecture # ------------------------ # # OpenEnv uses a client-server architecture: # # .. code-block:: text # # ┌─────────────────────────────────────────────────────────────┐ # │ YOUR CODE │ # │ │ # │ from openenv import AutoEnv │ # │ OpenSpielEnv = AutoEnv.get_env_class("openspiel") │ # │ env = OpenSpielEnv(base_url="http://localhost:8000") │ # │ result = env.reset() # Sends WebSocket message │ # │ result = env.step(action) # Sends WebSocket message │ # │ │ # └────────────────────────┬────────────────────────────────────┘ # │ # │ WebSocket (persistent connection) # │ # ┌────────────────────────▼────────────────────────────────────┐ # │ DOCKER CONTAINER │ # │ │ # │ ┌─────────────────────────────────────────────────────┐ │ # │ │ FastAPI Server + Environment Logic │ │ # │ │ - /ws (WebSocket endpoint) │ │ # │ │ - Handles reset(), step(), state() │ │ # │ │ - Runs the actual game simulation │ │ # │ └─────────────────────────────────────────────────────┘ │ # │ │ # │ Isolated • Reproducible • Scalable │ # └─────────────────────────────────────────────────────────────┘ # # **Key insight**: You never deal with HTTP/WebSocket directly. # The OpenEnv client handles all the networking! # %% # Summary # ------- # # In this notebook, you learned: # # **What OpenEnv Is:** # # - A unified framework for RL environments # - Containerized, type-safe, and shareable # # **Why Use OpenEnv:** # # - Type safety with IDE autocomplete # - Isolated Docker containers # - Easy sharing via Hugging Face Hub # # **How to Use It:** # # - ``env.reset()`` - Start a new episode # - ``env.step(action)`` - Take an action # - ``env.state()`` - Get current state # # Next Steps # ---------- # # **Continue to Notebook 2: Using Environments** # # In the next notebook, you'll: # # - Explore all available OpenEnv environments # - Create different AI policies # - Run evaluations and compare performance # - Work with multi-player games