File size: 17,289 Bytes
92cdf79 39de76a 92cdf79 57989d4 92cdf79 39de76a 92cdf79 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 | ---
license: other
license_name: nova-chess-engine-license
license_link: https://github.com/novachessai/novachess-engine/blob/main/LICENSE
language: en
tags:
- chess
- transformer
- human-move-prediction
- style-conditioned
pipeline_tag: other
library_name: onnx
---
# Nova Chess Engine
A style-conditioned transformer that predicts human chess moves. Given
a board position, a target player rating, and two optional style parameters
(classicalβhypermodern preference and aggression), Nova returns a
probability distribution over all legal moves calibrated to how a
player of that rating and style would play.
**Inference is a single forward pass of the neural network.** Nova
does not use Monte Carlo tree search, minimax, alpha-beta pruning,
or any form of engine-style position evaluation. There is no value
head and no lookahead. Move selection comes entirely from learned
patterns over the training corpus β fast on CPU (~35-50 ms per
position) and categorically different from search-based engines
like Stockfish or Leela.
**Play Nova directly at [novachess.ai](https://novachess.ai)** β
Nova powers the *Play* and *Train with Nova* features on the site,
where you can face Nova at any rating/style setting with built-in
analysis and post-game review.
- Developed by Nova Chess β <https://novachess.ai>
- Model type: pure-policy neural network over chess moves,
conditioned on position + rating + two style parameters.
Single forward pass per position; no search, no value head, no
game history.
- Language(s): not applicable β input is chess positions (18-channel
plane encoding), output is a distribution over 16,384 move indices
- License: custom non-commercial (see `LICENSE`)
Full results and reproducibility: [`RESULTS.md`](RESULTS.md).
Source code and docs: <https://github.com/novachessai/novachess-engine>
## Model Details
### Inputs
- `positions` β `float32` tensor of shape `(B, 18, 8, 8)`
- Planes 0β5: white pieces (P, N, B, R, Q, K), one-hot per square
- Planes 6β11: black pieces (P, N, B, R, Q, K)
- Plane 12: side to move (1s if white to move, 0s otherwise)
- Planes 13β16: castling rights (white-kingside, white-queenside,
black-kingside, black-queenside)
- Plane 17: en-passant file indicator
- `conditioning` β `float32` tensor of shape `(B, 3)`
- `rating_norm` = `(rating β 800) / (2700 β 800)`, clipped to `[0, 1]`
- `classical` β `[0, 1]` β opening preference (higher = more
classical mainlines)
- `aggression` β `[0, 1]` β tactical/sacrificial tendency
### Outputs
- `logits` β `float32` tensor of shape `(B, 16384)`. Raw logits over
the 16,384-index move space. Caller is responsible for masking
illegal moves and applying softmax.
Move index encoding:
```
move_index = promotion_offset + from_square * 64 + to_square
promotion_offset:
0 no promotion (also queen promotion)
4096 knight promotion
8192 bishop promotion
12288 rook promotion
```
where `from_square` and `to_square` are standard 0β63 indices
(`a1 = 0, h8 = 63`).
### Architectural distinction from prior work
Nova is **single-head pure-policy**: the network's only output is the
move distribution. There is no value head (no game-outcome
prediction), no auxiliary head (no side-task supervision on captures,
checks, etc.), no search at inference, no lookahead, and no use of
any position evaluator before, during, or after the forward pass.
Move selection comes entirely from the policy distribution the
network learned by predicting actual human moves at the conditioned
rating and style.
This contrasts with every published comparable model:
| Model | Heads at inference | Search | Notes |
|---|---|---|---|
| **Nova** (this release) | 1 (policy) | none | single forward pass, ~35β50 ms CPU |
| Maia-2 (NeurIPS 2024) | 3 (policy + value + auxiliary) | none | value head regresses W/D/L; auxiliary head predicts legal moves, captures, check delivery |
| Maia-3 (`maia3_simplified.onnx`) | 2 (policy + value) | none | drops Maia-2's auxiliary head; retains W/D/L value head |
| Allie (ICLR 2025) | 1 (policy) + value via search | adaptive MCTS at inference | policy is decoder-only over move sequences; MCTS provides per-position evaluation at runtime |
| Leela (LC0) | 2 (policy + value) | MCTS | engine-strength playing model |
| Stockfish (NNUE) | evaluation only | alpha-beta | not a human-move predictor |
The pure-policy stance is a deliberate design choice. It keeps the
model fast (one forward pass, no tree expansion), simple to deploy
(just the ONNX file β no MCTS implementation, no auxiliary supervision
data at training time, no value-head calibration to maintain), and
forces the network to learn move quality entirely from move-selection
patterns rather than offloading it to a parallel evaluator. The
benchmarks in `RESULTS.md` show this is competitive with multi-head
architectures on the move-prediction task itself.
### Files in this repository
- `nova_v3b.onnx` + `nova_v3b.onnx.data` β ONNX export with external
data. Both files required at inference time and **must keep their
exact filenames** β the `.onnx` file embeds a reference to the
`.data` sidecar by name. Place in the same directory before loading.
- `nova_v3b.pt` β PyTorch checkpoint (weights only) for research
and fine-tuning.
- `unified_sample_600k.pkl` β the 600K-position out-of-sample
evaluation set used in the results reported below. Schema:
`{fen, actual, rating, ply, min_clock, piece_count, band,
player_id, result, ...}`.
- `nova_neutral_600k.pkl`, `nova_actual_600k.pkl`,
`maia2_600k.pkl`, `maia3_600k.pkl` β per-position predictions from
each model on the 600K sample, used for the paired significance
tests in `RESULTS.md`.
## Uses
### Direct use
- Predict the probability distribution over legal moves that a human
of a specified rating and style would play from a given position.
- Sample moves to run as a human-like opponent or study partner.
- Score actual human moves by `P(actual_move | position, rating,
style)` for humanness analysis, move difficulty assessment, or
anti-cheat signals.
- Benchmark other human-move predictors against Nova on shared
evaluation sets.
- Fine-tune on specialized data (specific player corpora, specific
opening systems, specific time controls) for personal or research
use.
### Downstream use
The model is used internally by Nova Chess to power the
*Play Nova* and *Train with Nova* features at https://novachess.ai,
where end users play or practice against the model at chosen rating
and style settings. The same weights published in this repository are
the ones served by the application.
The in-app version additionally wraps Nova's policy output with a
small calibration layer used to tune playing strength across rating
tiers. The primary lever is a **per-tier temperature schedule**. On
top of that, Nova's sampled candidates pass through a probabilistic
quality check: high-confidence picks (where Nova's policy concentrates
significant probability mass on a single move) are played directly,
while lower-confidence picks may be sent to Stockfish for a low-depth
evaluation. If the evaluation falls below a tier-dependent quality
threshold, the move *may* be replaced by re-sampling from Nova's
distribution. Both the rate at which positions are evaluated and the
rate at which sub-threshold moves are actually replaced vary by tier,
so the bot's mistake profile matches the empirical chess.com CP-loss
profile at that level β at lower tiers, more sub-optimal moves slip
through (because players at that level make them); at higher tiers,
far fewer do. Additional calibration components layer on top of this
base flow.
**Every move the in-app bot plays still originates from Nova's policy
distribution.** Stockfish is never used to suggest, generate, or
select moves β only to evaluate moves Nova has already proposed, so
that obvious blunders at higher tiers can be probabilistically caught.
The model weights are never touched. The calibration layer is **not**
part of this release; the released checkpoint is the bare policy
model, exactly the surface that benchmarks and downstream research /
fine-tuning should target. See the README's "In-app behavior vs the
released model" section for the full distinction.
### Out-of-scope use
- Not suitable as a chess-playing engine for maximum-strength
competition against search-based engines. Nova is trained on
human-move prediction β it maximizes `P(move | human of rating R)`,
not `P(best move)` β and uses no search, no lookahead, and no
position evaluation beyond the single forward pass of the policy
network. For engine-quality play, a search-based evaluator such as
Stockfish remains the correct choice.
- Not intended for cheating detection as a standalone verdict. Nova
probabilities can inform a cheat-detection pipeline but should not
be used as sole evidence for accusations.
- Not validated on chess variants (Chess960, King of the Hill, etc.)
β trained only on standard chess.
- Not a replacement for human coaching β move probabilities are not
explanations, and the model does not produce commentary or verbal
analysis.
## Bias, Risks, and Limitations
- **Training-data distribution.** Nova is trained on ~520M positions
from Lichess rapid games played AprβNov 2025. The player population
is self-selected (online rapid players on one platform), skews
toward active users in rating bands 1100β2300, and may not
represent the full distribution of human chess play. Inferences
about moves at extreme ratings (particularly below 800 and above
2500) have less training-data support.
- **Style axis limitations.** The classical and aggression axes
capture specific operational definitions (opening move choices for
classical; captures + territorial control + king pressure for
aggression). They do not capture all dimensions of human chess
style (combinational richness, prophylaxis, time management, etc.).
- **Rating conditioning is a scalar.** Nova receives a single number
for rating, not a distribution. The model has learned a continuous
interpolation of playing strength, but at the high end of the
rating axis the playing strength it produces may saturate below the
conditioned rating.
- **No game history.** Nova conditions on the current position only,
not on the preceding move sequence. Two positions with identical
FENs are indistinguishable to the model even if reached through
very different games.
- **No check for illegal moves.** The raw logits include mass on
illegal move indices. Callers must apply a legal-move mask before
sampling. See the README quickstart and `docs/serving.md` for the
reference masking pattern.
- **Value / result prediction is not supported.** This checkpoint is
policy-only; it does not output win/draw/loss probabilities.
## Training Details
### Training data
Nova was trained on a large corpus of Lichess rapid games, balanced
across six rating bands from 800 to 2700+. Position sampling and
filtering were tuned to keep all skill levels and all game phases
well-represented in training. Details of the data pipeline and
cohort balancing are not published.
### Training procedure
Nova is trained end-to-end with a cross-entropy objective over the
16,384-index move space. The output is the policy distribution over
legal moves, with no auxiliary value head. Specific architectural
dimensions and training hyperparameters are not published.
### Inference cost
- CPU (ONNX fp32): 35β50 ms per position on a modern x86 core
- GPU (batched, H100): ~1 ms per position
- Inference memory: approximately 500 MB RAM per worker (fp32 weights
with external-data sidecar)
## Evaluation
### Evaluation data
A single held-out evaluation sample of **600,000 positions** drawn
from Lichess rapid games played in **March 2026**, stratified at
100,000 positions per rating band. This sample is temporally held out
from Nova's training data and is shipped as `unified_sample_600k.pkl`
on Hugging Face.
### Metrics
- **hit1** β fraction of positions where the model's top prediction
matches the human's actual move (top-1 accuracy)
- **hit5** β fraction of positions where the human's move is in the
model's top-5 predictions
- **Mean P(actual)** β mean probability mass that the model assigned
to the move the human actually played
- **Mean top-5 mass** β mean total probability mass assigned to the
top-5 predictions
### Results
On the 600,000-position sample, comparing Nova against the publicly
available Maia-3 checkpoint (`maia3_simplified.onnx` from
https://maiachess.com) and the Maia-2 rapid checkpoint:
| Metric | Maia-2 | Maia-3 | Nova (neutral style) |
|---|---|---|---|
| Top-1 hit rate | 50.27 % | **54.83 %** | 54.60 % |
| Top-5 hit rate | 88.38 % | **91.23 %** | 91.10 % |
| Mean P(actual) | 38.44 % | 42.10 % | **42.51 %** |
| Mean top-5 mass | 89.33 % | 91.96 % | **92.26 %** |
All four Nova-vs-Maia-3 deltas are statistically significant under
paired McNemar tests (for hit rates) and paired t-tests (for
probability-mass metrics). Both probability-mass deltas remain
significant under a player-clustered bootstrap (95% CIs reported in
RESULTS.md).
Full breakdown by rating band, Maia tier (Skilled / Advanced /
Master), game phase, piece count, and three filter variants (all
positions; `ply β₯ 10`; `ply β₯ 10 + clock β₯ 30 s`) is in
[`RESULTS.md`](RESULTS.md).
## How to use
Minimum-dependency inference example (CPU):
```bash
pip install onnxruntime python-chess numpy
```
```python
import chess
import numpy as np
import onnxruntime as ort
PIECE = {"P":0,"N":1,"B":2,"R":3,"Q":4,"K":5,
"p":6,"n":7,"b":8,"r":9,"q":10,"k":11}
def fen_to_planes(fen):
planes = np.zeros((18, 8, 8), dtype=np.float32)
parts = fen.split()
board, turn, castling, ep = parts[0], parts[1], parts[2], parts[3]
for ri, rank_str in enumerate(board.split("/")):
rank_idx, file_idx = 7 - ri, 0
for ch in rank_str:
if ch.isdigit():
file_idx += int(ch)
else:
planes[PIECE[ch], rank_idx, file_idx] = 1.0
file_idx += 1
if turn == "w": planes[12].fill(1.0)
if "K" in castling: planes[13].fill(1.0)
if "Q" in castling: planes[14].fill(1.0)
if "k" in castling: planes[15].fill(1.0)
if "q" in castling: planes[16].fill(1.0)
if ep != "-" and len(ep) == 2:
planes[17, 0, ord(ep[0]) - ord("a")] = 1.0
return planes
session = ort.InferenceSession("nova_v3b.onnx",
providers=["CPUExecutionProvider"])
board = chess.Board()
positions = fen_to_planes(board.fen())[np.newaxis]
# rating=1600, neutral classical + aggression
conditioning = np.array([[(1600 - 800) / (2700 - 800), 0.5, 0.5]],
dtype=np.float32)
logits = session.run(None, {"positions": positions,
"conditioning": conditioning})[0][0]
# Mask illegals + softmax
legal = np.zeros(16384, dtype=bool)
for mv in board.legal_moves:
idx = mv.from_square * 64 + mv.to_square
if mv.promotion == chess.KNIGHT: idx += 4096
elif mv.promotion == chess.BISHOP: idx += 4096 * 2
elif mv.promotion == chess.ROOK: idx += 4096 * 3
legal[idx] = True
masked = np.where(legal, logits, -1e9)
probs = np.exp(masked - masked.max())
probs *= legal
probs /= probs.sum()
top = np.argsort(probs)[::-1][:5]
for i in top:
print(f" index {int(i):5d} p = {probs[i]*100:.2f}%")
```
For production deployment notes (multi-worker setup, rate limiting,
temperature schedules, observability), see `docs/serving.md` in the
GitHub repository.
## Citation
```
Nova Chess Engine. Nova Chess, 2026.
https://github.com/novachessai/novachess-engine
https://huggingface.co/novachess/novachess-engine
```
BibTeX:
```bibtex
@misc{novachess_2026,
title = {Nova Chess Engine},
author = {Nova Chess},
year = {2026},
url = {https://github.com/novachessai/novachess-engine}
}
```
## Acknowledgments
Nova builds on prior work in human-move prediction. The evaluation
methodology (rating-band stratification, tier definitions, ply-based
filters, Lichess rapid data) follows conventions established by the
Maia project.
- Maia-1 β McIlroy-Young, Sen, Kleinberg & Anderson, *Aligning
Superhuman AI with Human Behavior: Chess as a Model System*,
KDD 2020. [arXiv:2006.01855](https://arxiv.org/abs/2006.01855)
- Maia-2 β Tang, Jiao, McIlroy-Young, Kleinberg, Sen & Anderson,
*Maia-2: A Unified Model for Human-AI Alignment in Chess*,
NeurIPS 2024. [arXiv:2409.20553](https://arxiv.org/abs/2409.20553)
- Maia-3 β Maia project, https://maiachess.com. The specific
checkpoint evaluated here is `maia3_simplified.onnx` published there.
- Allie β Khoshneviszadeh, Chi, Sheller et al., *Allie: Emergent
Human-Like Play Through Adaptive MCTS with a Decoder-Only
Transformer*, ICLR 2025.
[arXiv:2410.03893](https://arxiv.org/abs/2410.03893)
## Contact
- Product website: <https://novachess.ai>
- GitHub issues: <https://github.com/novachessai/novachess-engine/issues>
- Commercial licensing inquiries: support@novachess.ai
|