brandonlanexyz commited on
Commit
cf2aacd
·
verified ·
1 Parent(s): e98c898

Initial upload of Dualist Othello AI (Iteration 652)

Browse files
Files changed (8) hide show
  1. README.md +64 -41
  2. bitboard.py +81 -0
  3. dtypes.py +23 -0
  4. dualist_model.pth +3 -0
  5. game.py +88 -0
  6. inference.py +86 -0
  7. model.py +72 -0
  8. requirements.txt +3 -0
README.md CHANGED
@@ -1,41 +1,64 @@
1
- ---
2
- license: apache-2.0
3
- language:
4
- - en
5
- - sv
6
- metrics:
7
- - accuracy
8
- tags:
9
- - othello
10
- - reinforcement-learning
11
- - alphazero
12
- - edax
13
- - board-games
14
- ---
15
-
16
- # Dualist: Hybrid Othello AI
17
-
18
- Dualist is a high-performance Othello agent utilizing a hybrid architecture that integrates **PyTorch** with the world-class engine **Edax**
19
-
20
- ## Architecture
21
- The system is built around a triad of interacting components[cite: 10]:
22
- **The Student (PyTorch):** A deep neural network (ResNet) featuring a dual-head structure for Policy and Value prediction.
23
- **The Teacher (Edax):** Functions as an "Oracle" by providing ground-truth evaluations via an optimized C-based bitboard engine.
24
- **Experience Replay Buffer:** Stores millions of positions in LMDB format to break correlation and stabilize training.
25
-
26
- ## Technical Specifications
27
- **Input:** A (3, 8, 8) tensor encoding the current player's pieces, the opponent's pieces, and the current turn.
28
- **Training Methodology:** A Teacher-Student Curriculum transitioning from Supervised Bootstrapping to Reinforcement Learning with dynamic search depth.
29
- **Integration:** High-performance Python bridge via `ctypes` to call Edax functions directly in memory without CLI overhead.
30
-
31
- ## Deployment & Usage
32
- The model is designed to operate within a modern stack including:
33
- * **FastAPI** for the inference API.
34
- * **PostgreSQL** for match history and analytical storage.
35
- * **Vite / React Native** for cross-platform frontend interaction.
36
-
37
- https://cdn-uploads.huggingface.co/production/uploads/65fc3d2c2ba04e5ae4f1c1c6/pR5AEfGMhjljsPQK5VEVG.mp4
38
-
39
- ![unnamed (8)](https://cdn-uploads.huggingface.co/production/uploads/65fc3d2c2ba04e5ae4f1c1c6/-idirPB4DuZ9BqmV0vbSv.png)
40
-
41
- ![unnamed (9)](https://cdn-uploads.huggingface.co/production/uploads/65fc3d2c2ba04e5ae4f1c1c6/ijPJ0Q7luuTXBLFCug3l8.png)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Dualist Othello AI
2
+
3
+ Dualist is a high-performance Othello (Reversi) AI model trained using a **Deep Residual Neural Network** architecture. It was developed as part of a hybrid learning project where a bitboard-based engine (Edax) acted as the "Grandmaster Teacher" to train the neural network via curriculum learning.
4
+
5
+ ## Features
6
+ - **Architecture**: 10 Residual Blocks with 256 channels.
7
+ - **Input**: 3x8x8 planes (Player bits, Opponent bits, Turn/Constant).
8
+ - **Heuristics**: Trained to emulate professional-level Othello gameplay and strategic positioning.
9
+ - **Teacher**: Supervised and Reinforcement Learning against the Edax engine (Depth 1-30).
10
+
11
+ ## Model Details
12
+ - **Model File**: `dualist_model.pth`
13
+ - **Total Parameters**: Optimized for balancing speed and strategic depth.
14
+ - **Architecture Class**: `OthelloNet` in `model.py`.
15
+
16
+ ## Installation & Usage
17
+
18
+ ### Prerequisites
19
+ - Python 3.8+
20
+ - PyTorch
21
+ - NumPy
22
+
23
+ ### Quick Start (Inference)
24
+ The model can be loaded and used for move prediction. Make sure `model.py`, `bitboard.py`, and `dualist_model.pth` are in your working directory.
25
+
26
+ ```python
27
+ import torch
28
+ from model import OthelloNet
29
+ from bitboard import get_bit, make_input_planes
30
+
31
+ # Load model
32
+ model = OthelloNet(num_res_blocks=10, num_channels=256)
33
+ checkpoint = torch.load("dualist_model.pth", map_location="cpu")
34
+ model.load_state_dict(checkpoint["model_state_dict"])
35
+ model.eval()
36
+
37
+ # Example input (Bitboards)
38
+ black_bb = 0x0000000810000000
39
+ white_bb = 0x0000001008000000
40
+
41
+ # Get prediction
42
+ input_planes = make_input_planes(black_bb, white_bb)
43
+ with torch.no_grad():
44
+ policy, value = model(input_planes)
45
+
46
+ # 'policy' contains move probabilities (log_softmax)
47
+ # 'value' is the predicted game outcome [-1, 1]
48
+ ```
49
+
50
+ ### Files Description
51
+ - `dualist_model.pth`: Pre-trained weights for the OthelloNet.
52
+ - `model.py`: Neural Network architecture definition.
53
+ - `game.py`: Core Othello logic and move generation.
54
+ - `bitboard.py`: Bit manipulation and input plane processing.
55
+ - `inference.py`: Example script to run the model on a board state.
56
+
57
+ ## Hugging Face Integration
58
+ To push this to your Hugging Face account:
59
+ 1. Install `huggingface_hub`: `pip install huggingface_hub`
60
+ 2. Login: `huggingface-cli login`
61
+ 3. Push files to `brandonlanexyz/dualist`.
62
+
63
+ ---
64
+ *Created by Brandon | Part of the AntiGravity AI-LAB Othello Project*
bitboard.py ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy as np
2
+
3
+
4
+ # Bitboard Constants
5
+ BOARD_SIZE = 8
6
+ FULL_MASK = 0xFFFFFFFFFFFFFFFF
7
+
8
+ def popcount(x):
9
+ """Counts set bits in a 64-bit integer."""
10
+ return bin(x).count('1')
11
+
12
+ def bit_to_row_col(bit_mask):
13
+ """Converts a single bit mask to (row, col) coordinates."""
14
+ if bit_mask == 0:
15
+ return -1, -1
16
+ # Find the index of the set bit (0-63)
17
+ # Assumes only one bit is set
18
+ idx = bit_mask.bit_length() - 1
19
+ # Edax/Othello usually maps MSB to A1 (0,0) or LSB to H8 (7,7)
20
+ # Let's align with Edax: A1 is usually high bit.
21
+ # Standard: index 63 is A1, index 0 is H8.
22
+ # row = (63 - idx) // 8
23
+ # col = (63 - idx) % 8
24
+ # However, standard bit manipulation often uses LSB=0.
25
+ # Let's check Edax conventions later, but for now standard math:
26
+ row = (63 - idx) // 8
27
+ col = (63 - idx) % 8
28
+ return row, col
29
+
30
+ def get_bit(row, col):
31
+ """Returns a bitmask with a single bit set at (row, col)."""
32
+ shift = 63 - (row * 8 + col)
33
+ return 1 << shift
34
+
35
+ def make_input_planes(player_bb, opponent_bb):
36
+ """
37
+ Converts bitboards into a 3x8x8 input tensor for the Neural Network.
38
+ Plane 0: Player pieces (1 if present, 0 otherwise)
39
+ Plane 1: Opponent pieces (1 if present, 0 otherwise)
40
+ Plane 2: Constant 1 (indicating it's the player's turn, or generally providing board usage context)
41
+ Some implementations use 'Valid Moves' here instead.
42
+ Let's use a constant plane for now as per AlphaZero standard,
43
+ or we can update to valid moves if we have them handy.
44
+ """
45
+ planes = np.zeros((3, 8, 8), dtype=np.float32)
46
+
47
+ # Fill Plane 0 (Player)
48
+ for r in range(8):
49
+ for c in range(8):
50
+ mask = get_bit(r, c)
51
+ if player_bb & mask:
52
+ planes[0, r, c] = 1.0
53
+
54
+ # Fill Plane 1 (Opponent)
55
+ for r in range(8):
56
+ for c in range(8):
57
+ mask = get_bit(r, c)
58
+ if opponent_bb & mask:
59
+ planes[1, r, c] = 1.0
60
+
61
+ # Fill Plane 2 (Constant / Color)
62
+ # Often for single-network (canonical form), this might just be 1s.
63
+ planes[2, :, :] = 1.0
64
+
65
+ import torch
66
+ return torch.tensor(planes).unsqueeze(0) # Add batch dimension: (1, 3, 8, 8)
67
+
68
+ def print_board(black_bb, white_bb):
69
+ """Prints the board state using B/W symbols."""
70
+ print(" A B C D E F G H")
71
+ for r in range(8):
72
+ line = f"{r+1} "
73
+ for c in range(8):
74
+ mask = get_bit(r, c)
75
+ if black_bb & mask:
76
+ line += "B "
77
+ elif white_bb & mask:
78
+ line += "W "
79
+ else:
80
+ line += ". "
81
+ print(line)
dtypes.py ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import NamedTuple
2
+ import numpy as np
3
+
4
+ class Experience(NamedTuple):
5
+ """
6
+ Represents a single training example from self-play.
7
+
8
+ Attributes:
9
+ state (np.ndarray): The board state (canonical form), typically 3x8x8 (Player, Opponent, Valid/Turn).
10
+ policy (np.ndarray): The MCTS visit counts or probability distribution (size 65).
11
+ value (float): The final game outcome from the perspective of the player (1 for win, -1 for loss, 0 for draw).
12
+ """
13
+ state: np.ndarray
14
+ policy: np.ndarray
15
+ value: float
16
+
17
+ class GameResult(NamedTuple):
18
+ """
19
+ Represents the final outcome of a game.
20
+ """
21
+ final_board: np.ndarray
22
+ winner: int # 1 for Black, -1 for White, 0 for Draw
23
+ score_diff: int # Black score - White score
dualist_model.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4f2b4cfc68e08a211dbe1c95841d3cca181e0f66f1b80e9f7dc06ebc3e9bdaa3
3
+ size 47452382
game.py ADDED
@@ -0,0 +1,88 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy as np
2
+ from src.bitboard import get_bit, bit_to_row_col, popcount
3
+
4
+ class OthelloGame:
5
+ def __init__(self):
6
+ # Initial Board Setup (A1 = MSB, H8 = LSB)
7
+ # Black pieces: D5 (35), E4 (28) -> 0x0000000810000000
8
+ # White pieces: D4 (36), E5 (27) -> 0x0000001008000000
9
+ self.player_bb = 0x0000000810000000 # Black starts
10
+ self.opponent_bb = 0x0000001008000000
11
+ self.turn = 1 # 1: Black, -1: White
12
+
13
+ def get_valid_moves(self, player, opponent):
14
+ """Calculates valid moves for 'player' against 'opponent'."""
15
+ empty = ~(player | opponent) & 0xFFFFFFFFFFFFFFFF
16
+
17
+ # Consistent with MSB=A1:
18
+ # North: << 8. South: >> 8.
19
+ # West: << 1 (mask A). East: >> 1 (mask H).
20
+ mask_h = 0x0101010101010101
21
+ mask_a = 0x8080808080808080
22
+
23
+ # Directions
24
+ shifts = [
25
+ (lambda x: (x & ~mask_h) >> 1), # East
26
+ (lambda x: (x & ~mask_a) << 1), # West
27
+ (lambda x: (x << 8) & 0xFFFFFFFFFFFFFFFF), # North
28
+ (lambda x: (x >> 8) & 0xFFFFFFFFFFFFFFFF), # South
29
+ (lambda x: (x & ~mask_h) << 7), # NE (N+E -> <<8 + >>1 = <<7)
30
+ (lambda x: (x & ~mask_a) << 9), # NW (N+W -> <<8 + <<1 = <<9)
31
+ (lambda x: (x & ~mask_h) >> 9), # SE (S+E -> >>8 + >>1 = >>9)
32
+ (lambda x: (x & ~mask_a) >> 7) # SW (S+W -> >>8 + <<1 = >>7)
33
+ ]
34
+
35
+ valid_moves = 0
36
+ for shift_func in shifts:
37
+ candidates = shift_func(player) & opponent
38
+ for _ in range(6): # Max 6 opponent pieces can be in between
39
+ candidates |= shift_func(candidates) & opponent
40
+ valid_moves |= shift_func(candidates) & empty
41
+
42
+ return valid_moves
43
+
44
+ def apply_move(self, player, opponent, move_bit):
45
+ """Calculates new boards after move_bit."""
46
+ if move_bit == 0:
47
+ return player, opponent
48
+
49
+ flipped = 0
50
+ mask_h = 0x0101010101010101
51
+ mask_a = 0x8080808080808080
52
+
53
+ shifts = [
54
+ (lambda x: (x & ~mask_h) >> 1), # East
55
+ (lambda x: (x & ~mask_a) << 1), # West
56
+ (lambda x: (x << 8) & 0xFFFFFFFFFFFFFFFF), # North
57
+ (lambda x: (x >> 8) & 0xFFFFFFFFFFFFFFFF), # South
58
+ (lambda x: (x & ~mask_h) << 7), # NE
59
+ (lambda x: (x & ~mask_a) << 9), # NW
60
+ (lambda x: (x & ~mask_h) >> 9), # SE
61
+ (lambda x: (x & ~mask_a) >> 7) # SW
62
+ ]
63
+
64
+ for shift_func in shifts:
65
+ mask = shift_func(move_bit)
66
+ potential_flips = 0
67
+ while mask & opponent:
68
+ potential_flips |= mask
69
+ mask = shift_func(mask)
70
+ if mask & player:
71
+ flipped |= potential_flips
72
+
73
+ new_player = player | move_bit | flipped
74
+ new_opponent = opponent & ~flipped
75
+ return new_player, new_opponent
76
+
77
+ def play_move(self, move_bit):
78
+ if move_bit != 0:
79
+ self.player_bb, self.opponent_bb = self.apply_move(self.player_bb, self.opponent_bb, move_bit)
80
+
81
+ # Turn always swaps (even on pass)
82
+ self.player_bb, self.opponent_bb = self.opponent_bb, self.player_bb
83
+ self.turn *= -1
84
+
85
+ def is_terminal(self):
86
+ p_moves = self.get_valid_moves(self.player_bb, self.opponent_bb)
87
+ o_moves = self.get_valid_moves(self.opponent_bb, self.player_bb)
88
+ return (p_moves == 0) and (o_moves == 0)
inference.py ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ import torch.nn.functional as F
3
+ from model import OthelloNet
4
+ from bitboard import get_bit, make_input_planes
5
+ import numpy as np
6
+
7
+ def load_dualist(model_path="dualist_model.pth", device="cpu"):
8
+ """
9
+ Loads the Dualist Othello model.
10
+ """
11
+ model = OthelloNet(num_res_blocks=10, num_channels=256)
12
+ checkpoint = torch.load(model_path, map_location=device)
13
+
14
+ # Handle both full state dict and partial if needed
15
+ if "model_state_dict" in checkpoint:
16
+ model.load_state_dict(checkpoint["model_state_dict"])
17
+ else:
18
+ model.load_state_dict(checkpoint)
19
+
20
+ model.to(device)
21
+ model.eval()
22
+ return model
23
+
24
+ def get_best_move(model, player_bb, opponent_bb, legal_moves_bb, device="cpu"):
25
+ """
26
+ Given the current board state and legal moves, returns the best move (bitmask).
27
+ """
28
+ # 1. Prepare input planes (3x8x8)
29
+ input_tensor = make_input_planes(player_bb, opponent_bb).to(device)
30
+
31
+ # 2. Forward pass
32
+ with torch.no_grad():
33
+ policy_logits, value = model(input_tensor)
34
+
35
+ # 3. Filter legal moves and find best
36
+ # The policy head outputs 65 indices (64 squares + 1 pass)
37
+ # We ignore the pass move for now unless no other moves are possible
38
+ # We'll map back to bitmask
39
+
40
+ probs = torch.exp(policy_logits).squeeze(0).cpu().numpy()
41
+
42
+ best_move_idx = -1
43
+ max_prob = -1.0
44
+
45
+ for i in range(64):
46
+ # Convert index back to (row, col)
47
+ row, col = (63 - i) // 8, (63 - i) % 8
48
+ mask = get_bit(row, col)
49
+
50
+ if legal_moves_bb & mask:
51
+ if probs[i] > max_prob:
52
+ max_prob = probs[i]
53
+ best_move_idx = i
54
+
55
+ if best_move_idx == -1:
56
+ # Check if pass (idx 64) is the only option or if something is wrong
57
+ return 0 # Pass/No move
58
+
59
+ row, col = (63 - best_move_idx) // 8, (63 - best_move_idx) % 8
60
+ return get_bit(row, col)
61
+
62
+ if __name__ == "__main__":
63
+ # Quick example: Starting position
64
+ # Black: bit 28 and 35
65
+ # White: bit 27 and 36
66
+ # (Simplified for demonstration)
67
+
68
+ print("Dualist Inference Test")
69
+ try:
70
+ model = load_dualist()
71
+ print("Model loaded successfully!")
72
+
73
+ # Starting position (Black pieces, White pieces)
74
+ # B: (3,4), (4,3) -> bits 27, 36? (depends on indexing)
75
+ # Using bits from Othello standard starting board
76
+ black_bb = 0x0000000810000000
77
+ white_bb = 0x0000001008000000
78
+ legal_moves = 0x0000102004080000 # Standard opening moves for Black
79
+
80
+ best = get_best_move(model, black_bb, white_bb, legal_moves)
81
+ print(f"Best move found: {hex(best)}")
82
+
83
+ except FileNotFoundError:
84
+ print("Error: dualist_model.pth not found. Ensure it's in the same directory.")
85
+ except Exception as e:
86
+ print(f"An error occurred: {e}")
model.py ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ import torch.nn as nn
3
+ import torch.nn.functional as F
4
+
5
+ class ResidualBlock(nn.Module):
6
+ def __init__(self, channels):
7
+ super(ResidualBlock, self).__init__()
8
+ self.conv1 = nn.Conv2d(channels, channels, kernel_size=3, padding=1, bias=False)
9
+ self.bn1 = nn.BatchNorm2d(channels)
10
+ self.conv2 = nn.Conv2d(channels, channels, kernel_size=3, padding=1, bias=False)
11
+ self.bn2 = nn.BatchNorm2d(channels)
12
+
13
+ def forward(self, x):
14
+ residual = x
15
+ out = F.relu(self.bn1(self.conv1(x)))
16
+ out = self.bn2(self.conv2(out))
17
+ out += residual
18
+ out = F.relu(out)
19
+ return out
20
+
21
+ class OthelloNet(nn.Module):
22
+ def __init__(self, num_res_blocks=10, num_channels=256):
23
+ super(OthelloNet, self).__init__()
24
+
25
+ # Input: 3 channels (Player pieces, Opponent pieces, Legal moves/Constant plane)
26
+ self.conv_input = nn.Conv2d(3, num_channels, kernel_size=3, padding=1, bias=False)
27
+ self.bn_input = nn.BatchNorm2d(num_channels)
28
+
29
+ # Residual Tower
30
+ self.res_blocks = nn.ModuleList([
31
+ ResidualBlock(num_channels) for _ in range(num_res_blocks)
32
+ ])
33
+
34
+ # Policy Head
35
+ self.policy_conv = nn.Conv2d(num_channels, 2, kernel_size=1, bias=False)
36
+ self.policy_bn = nn.BatchNorm2d(2)
37
+ # 2 channels * 8 * 8 = 128
38
+ self.policy_fc = nn.Linear(128, 65) # 64 squares + pass
39
+
40
+ # Value Head
41
+ self.value_conv = nn.Conv2d(num_channels, 1, kernel_size=1, bias=False)
42
+ self.value_bn = nn.BatchNorm2d(1)
43
+ # 1 channel * 8 * 8 = 64
44
+ self.value_fc1 = nn.Linear(64, 256)
45
+ self.value_fc2 = nn.Linear(256, 1)
46
+
47
+ def forward(self, x):
48
+ # Input Convolution
49
+ x = F.relu(self.bn_input(self.conv_input(x)))
50
+
51
+ # Residual Tower
52
+ for block in self.res_blocks:
53
+ x = block(x)
54
+
55
+ # Policy Head
56
+ p = F.relu(self.policy_bn(self.policy_conv(x)))
57
+ p = p.view(p.size(0), -1) # Flatten
58
+ p = self.policy_fc(p)
59
+ # We return logits (unnormalized), let loss function handle softma separation
60
+ # Or return log_softmax for NLLLoss if needed.
61
+ # Often for alpha zero implementations, returning log_softmax for training stability is good
62
+ # But here let's stick to returning raw logits (or log_softmax)
63
+ # Let's return log_softmax as it is numerically stable for KLDivLoss
64
+ p = F.log_softmax(p, dim=1)
65
+
66
+ # Value Head
67
+ v = F.relu(self.value_bn(self.value_conv(x)))
68
+ v = v.view(v.size(0), -1) # Flatten
69
+ v = F.relu(self.value_fc1(v))
70
+ v = torch.tanh(self.value_fc2(v))
71
+
72
+ return p, v
requirements.txt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ torch>=1.8.0
2
+ numpy>=1.19.0
3
+ huggingface_hub