Initial upload of Dualist Othello AI (Iteration 652)

Browse files

Files changed (8) hide show

README.md +64 -41
bitboard.py +81 -0
dtypes.py +23 -0
dualist_model.pth +3 -0
game.py +88 -0
inference.py +86 -0
model.py +72 -0
requirements.txt +3 -0

README.md CHANGED Viewed

@@ -1,41 +1,64 @@
----
-license: apache-2.0
-language:
-- en
-- sv
-metrics:
-- accuracy
-tags:
-- othello
-- reinforcement-learning
-- alphazero
-- edax
-- board-games
----
-# Dualist: Hybrid Othello AI
-Dualist is a high-performance Othello agent utilizing a hybrid architecture that integrates **PyTorch** with the world-class engine **Edax**
-## Architecture
-The system is built around a triad of interacting components[cite: 10]:
-**The Student (PyTorch):** A deep neural network (ResNet) featuring a dual-head structure for Policy and Value prediction.
-**The Teacher (Edax):** Functions as an "Oracle" by providing ground-truth evaluations via an optimized C-based bitboard engine.
-**Experience Replay Buffer:** Stores millions of positions in LMDB format to break correlation and stabilize training.
-## Technical Specifications
-**Input:** A (3, 8, 8) tensor encoding the current player's pieces, the opponent's pieces, and the current turn.
-**Training Methodology:** A Teacher-Student Curriculum transitioning from Supervised Bootstrapping to Reinforcement Learning with dynamic search depth.
-**Integration:** High-performance Python bridge via `ctypes` to call Edax functions directly in memory without CLI overhead.
-## Deployment & Usage
-The model is designed to operate within a modern stack including:
-* **FastAPI** for the inference API.
-* **PostgreSQL** for match history and analytical storage.
-* **Vite / React Native** for cross-platform frontend interaction.
-https://cdn-uploads.huggingface.co/production/uploads/65fc3d2c2ba04e5ae4f1c1c6/pR5AEfGMhjljsPQK5VEVG.mp4
-![unnamed (8)](https://cdn-uploads.huggingface.co/production/uploads/65fc3d2c2ba04e5ae4f1c1c6/-idirPB4DuZ9BqmV0vbSv.png)
-![unnamed (9)](https://cdn-uploads.huggingface.co/production/uploads/65fc3d2c2ba04e5ae4f1c1c6/ijPJ0Q7luuTXBLFCug3l8.png)

+# Dualist Othello AI
+Dualist is a high-performance Othello (Reversi) AI model trained using a **Deep Residual Neural Network** architecture. It was developed as part of a hybrid learning project where a bitboard-based engine (Edax) acted as the "Grandmaster Teacher" to train the neural network via curriculum learning.
+## Features
+- **Architecture**: 10 Residual Blocks with 256 channels.
+- **Input**: 3x8x8 planes (Player bits, Opponent bits, Turn/Constant).
+- **Heuristics**: Trained to emulate professional-level Othello gameplay and strategic positioning.
+- **Teacher**: Supervised and Reinforcement Learning against the Edax engine (Depth 1-30).
+## Model Details
+- **Model File**: `dualist_model.pth`
+- **Total Parameters**: Optimized for balancing speed and strategic depth.
+- **Architecture Class**: `OthelloNet` in `model.py`.
+## Installation & Usage
+### Prerequisites
+- Python 3.8+
+- PyTorch
+- NumPy
+### Quick Start (Inference)
+The model can be loaded and used for move prediction. Make sure `model.py`, `bitboard.py`, and `dualist_model.pth` are in your working directory.
+```python
+import torch
+from model import OthelloNet
+from bitboard import get_bit, make_input_planes
+# Load model
+model = OthelloNet(num_res_blocks=10, num_channels=256)
+checkpoint = torch.load("dualist_model.pth", map_location="cpu")
+model.load_state_dict(checkpoint["model_state_dict"])
+model.eval()
+# Example input (Bitboards)
+black_bb = 0x0000000810000000
+white_bb = 0x0000001008000000
+# Get prediction
+input_planes = make_input_planes(black_bb, white_bb)
+with torch.no_grad():
+    policy, value = model(input_planes)
+# 'policy' contains move probabilities (log_softmax)
+# 'value' is the predicted game outcome [-1, 1]
+```
+### Files Description
+- `dualist_model.pth`: Pre-trained weights for the OthelloNet.
+- `model.py`: Neural Network architecture definition.
+- `game.py`: Core Othello logic and move generation.
+- `bitboard.py`: Bit manipulation and input plane processing.
+- `inference.py`: Example script to run the model on a board state.
+## Hugging Face Integration
+To push this to your Hugging Face account:
+1. Install `huggingface_hub`: `pip install huggingface_hub`
+2. Login: `huggingface-cli login`
+3. Push files to `brandonlanexyz/dualist`.
+---
+*Created by Brandon | Part of the AntiGravity AI-LAB Othello Project*

bitboard.py ADDED Viewed

	@@ -0,0 +1,81 @@

+import numpy as np
+# Bitboard Constants
+BOARD_SIZE = 8
+FULL_MASK = 0xFFFFFFFFFFFFFFFF
+def popcount(x):
+    """Counts set bits in a 64-bit integer."""
+    return bin(x).count('1')
+def bit_to_row_col(bit_mask):
+    """Converts a single bit mask to (row, col) coordinates."""
+    if bit_mask == 0:
+        return -1, -1
+    # Find the index of the set bit (0-63)
+    # Assumes only one bit is set
+    idx = bit_mask.bit_length() - 1
+    # Edax/Othello usually maps MSB to A1 (0,0) or LSB to H8 (7,7)
+    # Let's align with Edax: A1 is usually high bit.
+    # Standard: index 63 is A1, index 0 is H8.
+    # row = (63 - idx) // 8
+    # col = (63 - idx) % 8
+    # However, standard bit manipulation often uses LSB=0.
+    # Let's check Edax conventions later, but for now standard math:
+    row = (63 - idx) // 8
+    col = (63 - idx) % 8
+    return row, col
+def get_bit(row, col):
+    """Returns a bitmask with a single bit set at (row, col)."""
+    shift = 63 - (row * 8 + col)
+    return 1 << shift
+def make_input_planes(player_bb, opponent_bb):
+    """
+    Converts bitboards into a 3x8x8 input tensor for the Neural Network.
+    Plane 0: Player pieces (1 if present, 0 otherwise)
+    Plane 1: Opponent pieces (1 if present, 0 otherwise)
+    Plane 2: Constant 1 (indicating it's the player's turn, or generally providing board usage context)
+             Some implementations use 'Valid Moves' here instead.
+             Let's use a constant plane for now as per AlphaZero standard,
+             or we can update to valid moves if we have them handy.
+    """
+    planes = np.zeros((3, 8, 8), dtype=np.float32)
+    # Fill Plane 0 (Player)
+    for r in range(8):
+        for c in range(8):
+            mask = get_bit(r, c)
+            if player_bb & mask:
+                planes[0, r, c] = 1.0
+    # Fill Plane 1 (Opponent)
+    for r in range(8):
+        for c in range(8):
+            mask = get_bit(r, c)
+            if opponent_bb & mask:
+                planes[1, r, c] = 1.0
+    # Fill Plane 2 (Constant / Color)
+    # Often for single-network (canonical form), this might just be 1s.
+    planes[2, :, :] = 1.0
+    import torch
+    return torch.tensor(planes).unsqueeze(0) # Add batch dimension: (1, 3, 8, 8)
+def print_board(black_bb, white_bb):
+    """Prints the board state using B/W symbols."""
+    print("  A B C D E F G H")
+    for r in range(8):
+        line = f"{r+1} "
+        for c in range(8):
+            mask = get_bit(r, c)
+            if black_bb & mask:
+                line += "B "
+            elif white_bb & mask:
+                line += "W "
+            else:
+                line += ". "
+        print(line)

dtypes.py ADDED Viewed

	@@ -0,0 +1,23 @@

+from typing import NamedTuple
+import numpy as np
+class Experience(NamedTuple):
+    """
+    Represents a single training example from self-play.
+    Attributes:
+        state (np.ndarray): The board state (canonical form), typically 3x8x8 (Player, Opponent, Valid/Turn).
+        policy (np.ndarray): The MCTS visit counts or probability distribution (size 65).
+        value (float): The final game outcome from the perspective of the player (1 for win, -1 for loss, 0 for draw).
+    """
+    state: np.ndarray
+    policy: np.ndarray
+    value: float
+class GameResult(NamedTuple):
+    """
+    Represents the final outcome of a game.
+    """
+    final_board: np.ndarray
+    winner: int # 1 for Black, -1 for White, 0 for Draw
+    score_diff: int # Black score - White score

dualist_model.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4f2b4cfc68e08a211dbe1c95841d3cca181e0f66f1b80e9f7dc06ebc3e9bdaa3
+size 47452382

game.py ADDED Viewed

	@@ -0,0 +1,88 @@

+import numpy as np
+from src.bitboard import get_bit, bit_to_row_col, popcount
+class OthelloGame:
+    def __init__(self):
+        # Initial Board Setup (A1 = MSB, H8 = LSB)
+        # Black pieces: D5 (35), E4 (28) -> 0x0000000810000000
+        # White pieces: D4 (36), E5 (27) -> 0x0000001008000000
+        self.player_bb = 0x0000000810000000 # Black starts
+        self.opponent_bb = 0x0000001008000000
+        self.turn = 1 # 1: Black, -1: White
+    def get_valid_moves(self, player, opponent):
+        """Calculates valid moves for 'player' against 'opponent'."""
+        empty = ~(player | opponent) & 0xFFFFFFFFFFFFFFFF
+        # Consistent with MSB=A1:
+        # North: << 8. South: >> 8.
+        # West: << 1 (mask A). East: >> 1 (mask H).
+        mask_h = 0x0101010101010101
+        mask_a = 0x8080808080808080
+        # Directions
+        shifts = [
+             (lambda x: (x & ~mask_h) >> 1), # East
+             (lambda x: (x & ~mask_a) << 1), # West
+             (lambda x: (x << 8) & 0xFFFFFFFFFFFFFFFF), # North
+             (lambda x: (x >> 8) & 0xFFFFFFFFFFFFFFFF), # South
+             (lambda x: (x & ~mask_h) << 7), # NE (N+E -> <<8 + >>1 = <<7)
+             (lambda x: (x & ~mask_a) << 9), # NW (N+W -> <<8 + <<1 = <<9)
+             (lambda x: (x & ~mask_h) >> 9), # SE (S+E -> >>8 + >>1 = >>9)
+             (lambda x: (x & ~mask_a) >> 7)  # SW (S+W -> >>8 + <<1 = >>7)
+        ]
+        valid_moves = 0
+        for shift_func in shifts:
+            candidates = shift_func(player) & opponent
+            for _ in range(6): # Max 6 opponent pieces can be in between
+                candidates |= shift_func(candidates) & opponent
+            valid_moves |= shift_func(candidates) & empty
+        return valid_moves
+    def apply_move(self, player, opponent, move_bit):
+        """Calculates new boards after move_bit."""
+        if move_bit == 0:
+            return player, opponent
+        flipped = 0
+        mask_h = 0x0101010101010101
+        mask_a = 0x8080808080808080
+        shifts = [
+             (lambda x: (x & ~mask_h) >> 1), # East
+             (lambda x: (x & ~mask_a) << 1), # West
+             (lambda x: (x << 8) & 0xFFFFFFFFFFFFFFFF), # North
+             (lambda x: (x >> 8) & 0xFFFFFFFFFFFFFFFF), # South
+             (lambda x: (x & ~mask_h) << 7), # NE
+             (lambda x: (x & ~mask_a) << 9), # NW
+             (lambda x: (x & ~mask_h) >> 9), # SE
+             (lambda x: (x & ~mask_a) >> 7)  # SW
+        ]
+        for shift_func in shifts:
+            mask = shift_func(move_bit)
+            potential_flips = 0
+            while mask & opponent:
+                potential_flips |= mask
+                mask = shift_func(mask)
+            if mask & player:
+                flipped |= potential_flips
+        new_player = player | move_bit | flipped
+        new_opponent = opponent & ~flipped
+        return new_player, new_opponent
+    def play_move(self, move_bit):
+        if move_bit != 0:
+            self.player_bb, self.opponent_bb = self.apply_move(self.player_bb, self.opponent_bb, move_bit)
+        # Turn always swaps (even on pass)
+        self.player_bb, self.opponent_bb = self.opponent_bb, self.player_bb
+        self.turn *= -1
+    def is_terminal(self):
+        p_moves = self.get_valid_moves(self.player_bb, self.opponent_bb)
+        o_moves = self.get_valid_moves(self.opponent_bb, self.player_bb)
+        return (p_moves == 0) and (o_moves == 0)

inference.py ADDED Viewed

	@@ -0,0 +1,86 @@

+import torch
+import torch.nn.functional as F
+from model import OthelloNet
+from bitboard import get_bit, make_input_planes
+import numpy as np
+def load_dualist(model_path="dualist_model.pth", device="cpu"):
+    """
+    Loads the Dualist Othello model.
+    """
+    model = OthelloNet(num_res_blocks=10, num_channels=256)
+    checkpoint = torch.load(model_path, map_location=device)
+    # Handle both full state dict and partial if needed
+    if "model_state_dict" in checkpoint:
+        model.load_state_dict(checkpoint["model_state_dict"])
+    else:
+        model.load_state_dict(checkpoint)
+    model.to(device)
+    model.eval()
+    return model
+def get_best_move(model, player_bb, opponent_bb, legal_moves_bb, device="cpu"):
+    """
+    Given the current board state and legal moves, returns the best move (bitmask).
+    """
+    # 1. Prepare input planes (3x8x8)
+    input_tensor = make_input_planes(player_bb, opponent_bb).to(device)
+    # 2. Forward pass
+    with torch.no_grad():
+        policy_logits, value = model(input_tensor)
+    # 3. Filter legal moves and find best
+    # The policy head outputs 65 indices (64 squares + 1 pass)
+    # We ignore the pass move for now unless no other moves are possible
+    # We'll map back to bitmask
+    probs = torch.exp(policy_logits).squeeze(0).cpu().numpy()
+    best_move_idx = -1
+    max_prob = -1.0
+    for i in range(64):
+        # Convert index back to (row, col)
+        row, col = (63 - i) // 8, (63 - i) % 8
+        mask = get_bit(row, col)
+        if legal_moves_bb & mask:
+            if probs[i] > max_prob:
+                max_prob = probs[i]
+                best_move_idx = i
+    if best_move_idx == -1:
+        # Check if pass (idx 64) is the only option or if something is wrong
+        return 0 # Pass/No move
+    row, col = (63 - best_move_idx) // 8, (63 - best_move_idx) % 8
+    return get_bit(row, col)
+if __name__ == "__main__":
+    # Quick example: Starting position
+    # Black: bit 28 and 35
+    # White: bit 27 and 36
+    # (Simplified for demonstration)
+    print("Dualist Inference Test")
+    try:
+        model = load_dualist()
+        print("Model loaded successfully!")
+        # Starting position (Black pieces, White pieces)
+        # B: (3,4), (4,3) -> bits 27, 36? (depends on indexing)
+        # Using bits from Othello standard starting board
+        black_bb = 0x0000000810000000
+        white_bb = 0x0000001008000000
+        legal_moves = 0x0000102004080000 # Standard opening moves for Black
+        best = get_best_move(model, black_bb, white_bb, legal_moves)
+        print(f"Best move found: {hex(best)}")
+    except FileNotFoundError:
+        print("Error: dualist_model.pth not found. Ensure it's in the same directory.")
+    except Exception as e:
+        print(f"An error occurred: {e}")

model.py ADDED Viewed

	@@ -0,0 +1,72 @@

+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+class ResidualBlock(nn.Module):
+    def __init__(self, channels):
+        super(ResidualBlock, self).__init__()
+        self.conv1 = nn.Conv2d(channels, channels, kernel_size=3, padding=1, bias=False)
+        self.bn1 = nn.BatchNorm2d(channels)
+        self.conv2 = nn.Conv2d(channels, channels, kernel_size=3, padding=1, bias=False)
+        self.bn2 = nn.BatchNorm2d(channels)
+    def forward(self, x):
+        residual = x
+        out = F.relu(self.bn1(self.conv1(x)))
+        out = self.bn2(self.conv2(out))
+        out += residual
+        out = F.relu(out)
+        return out
+class OthelloNet(nn.Module):
+    def __init__(self, num_res_blocks=10, num_channels=256):
+        super(OthelloNet, self).__init__()
+        # Input: 3 channels (Player pieces, Opponent pieces, Legal moves/Constant plane)
+        self.conv_input = nn.Conv2d(3, num_channels, kernel_size=3, padding=1, bias=False)
+        self.bn_input = nn.BatchNorm2d(num_channels)
+        # Residual Tower
+        self.res_blocks = nn.ModuleList([
+            ResidualBlock(num_channels) for _ in range(num_res_blocks)
+        ])
+        # Policy Head
+        self.policy_conv = nn.Conv2d(num_channels, 2, kernel_size=1, bias=False)
+        self.policy_bn = nn.BatchNorm2d(2)
+        # 2 channels * 8 * 8 = 128
+        self.policy_fc = nn.Linear(128, 65) # 64 squares + pass
+        # Value Head
+        self.value_conv = nn.Conv2d(num_channels, 1, kernel_size=1, bias=False)
+        self.value_bn = nn.BatchNorm2d(1)
+        # 1 channel * 8 * 8 = 64
+        self.value_fc1 = nn.Linear(64, 256)
+        self.value_fc2 = nn.Linear(256, 1)
+    def forward(self, x):
+        # Input Convolution
+        x = F.relu(self.bn_input(self.conv_input(x)))
+        # Residual Tower
+        for block in self.res_blocks:
+            x = block(x)
+        # Policy Head
+        p = F.relu(self.policy_bn(self.policy_conv(x)))
+        p = p.view(p.size(0), -1) # Flatten
+        p = self.policy_fc(p)
+        # We return logits (unnormalized), let loss function handle softma separation
+        # Or return log_softmax for NLLLoss if needed.
+        # Often for alpha zero implementations, returning log_softmax for training stability is good
+        # But here let's stick to returning raw logits (or log_softmax)
+        # Let's return log_softmax as it is numerically stable for KLDivLoss
+        p = F.log_softmax(p, dim=1)
+        # Value Head
+        v = F.relu(self.value_bn(self.value_conv(x)))
+        v = v.view(v.size(0), -1) # Flatten
+        v = F.relu(self.value_fc1(v))
+        v = torch.tanh(self.value_fc2(v))
+        return p, v

requirements.txt ADDED Viewed

	@@ -0,0 +1,3 @@

+torch>=1.8.0
+numpy>=1.19.0
+huggingface_hub