| --- |
| license: other |
| license_name: nova-chess-engine-license |
| license_link: https://github.com/novachessai/novachess-engine/blob/main/LICENSE |
| language: en |
| tags: |
| - chess |
| - transformer |
| - human-move-prediction |
| - style-conditioned |
| pipeline_tag: other |
| library_name: onnx |
| --- |
| |
| # Nova Chess Engine |
|
|
| A style-conditioned transformer that predicts human chess moves. Given |
| a board position, a target player rating, and two optional style parameters |
| (classical–hypermodern preference and aggression), Nova returns a |
| probability distribution over all legal moves calibrated to how a |
| player of that rating and style would play. |
|
|
| **Inference is a single forward pass of the neural network.** Nova |
| does not use Monte Carlo tree search, minimax, alpha-beta pruning, |
| or any form of engine-style position evaluation. There is no value |
| head and no lookahead. Move selection comes entirely from learned |
| patterns over the training corpus — fast on CPU (~35-50 ms per |
| position) and categorically different from search-based engines |
| like Stockfish or Leela. |
|
|
| **Play Nova directly at [novachess.ai](https://novachess.ai)** — |
| Nova powers the *Play* and *Train with Nova* features on the site, |
| where you can face Nova at any rating/style setting with built-in |
| analysis and post-game review. |
|
|
| - Developed by Nova Chess — <https://novachess.ai> |
| - Model type: pure-policy neural network over chess moves, |
| conditioned on position + rating + two style parameters. |
| Single forward pass per position; no search, no value head, no |
| game history. |
| - Language(s): not applicable — input is chess positions (18-channel |
| plane encoding), output is a distribution over 16,384 move indices |
| - License: custom non-commercial (see `LICENSE`) |
|
|
| Full results and reproducibility: [`RESULTS.md`](RESULTS.md). |
| Source code and docs: <https://github.com/novachessai/novachess-engine> |
|
|
|
|
| ## Model Details |
|
|
| ### Inputs |
|
|
| - `positions` — `float32` tensor of shape `(B, 18, 8, 8)` |
| - Planes 0–5: white pieces (P, N, B, R, Q, K), one-hot per square |
| - Planes 6–11: black pieces (P, N, B, R, Q, K) |
| - Plane 12: side to move (1s if white to move, 0s otherwise) |
| - Planes 13–16: castling rights (white-kingside, white-queenside, |
| black-kingside, black-queenside) |
| - Plane 17: en-passant file indicator |
| - `conditioning` — `float32` tensor of shape `(B, 3)` |
| - `rating_norm` = `(rating − 800) / (2700 − 800)`, clipped to `[0, 1]` |
| - `classical` ∈ `[0, 1]` — opening preference (higher = more |
| classical mainlines) |
| - `aggression` ∈ `[0, 1]` — tactical/sacrificial tendency |
| |
| ### Outputs |
|
|
| - `logits` — `float32` tensor of shape `(B, 16384)`. Raw logits over |
| the 16,384-index move space. Caller is responsible for masking |
| illegal moves and applying softmax. |
|
|
| Move index encoding: |
|
|
| ``` |
| move_index = promotion_offset + from_square * 64 + to_square |
| promotion_offset: |
| 0 no promotion (also queen promotion) |
| 4096 knight promotion |
| 8192 bishop promotion |
| 12288 rook promotion |
| ``` |
|
|
| where `from_square` and `to_square` are standard 0–63 indices |
| (`a1 = 0, h8 = 63`). |
|
|
| ### Architectural distinction from prior work |
|
|
| Nova is **single-head pure-policy**: the network's only output is the |
| move distribution. There is no value head (no game-outcome |
| prediction), no auxiliary head (no side-task supervision on captures, |
| checks, etc.), no search at inference, no lookahead, and no use of |
| any position evaluator before, during, or after the forward pass. |
| Move selection comes entirely from the policy distribution the |
| network learned by predicting actual human moves at the conditioned |
| rating and style. |
|
|
| This contrasts with every published comparable model: |
|
|
| | Model | Heads at inference | Search | Notes | |
| |---|---|---|---| |
| | **Nova** (this release) | 1 (policy) | none | single forward pass, ~35–50 ms CPU | |
| | Maia-2 (NeurIPS 2024) | 3 (policy + value + auxiliary) | none | value head regresses W/D/L; auxiliary head predicts legal moves, captures, check delivery | |
| | Maia-3 (`maia3_simplified.onnx`) | 2 (policy + value) | none | drops Maia-2's auxiliary head; retains W/D/L value head | |
| | Allie (ICLR 2025) | 1 (policy) + value via search | adaptive MCTS at inference | policy is decoder-only over move sequences; MCTS provides per-position evaluation at runtime | |
| | Leela (LC0) | 2 (policy + value) | MCTS | engine-strength playing model | |
| | Stockfish (NNUE) | evaluation only | alpha-beta | not a human-move predictor | |
|
|
| The pure-policy stance is a deliberate design choice. It keeps the |
| model fast (one forward pass, no tree expansion), simple to deploy |
| (just the ONNX file — no MCTS implementation, no auxiliary supervision |
| data at training time, no value-head calibration to maintain), and |
| forces the network to learn move quality entirely from move-selection |
| patterns rather than offloading it to a parallel evaluator. The |
| benchmarks in `RESULTS.md` show this is competitive with multi-head |
| architectures on the move-prediction task itself. |
|
|
|
|
| ### Files in this repository |
|
|
| - `nova_v3b.onnx` + `nova_v3b.onnx.data` — ONNX export with external |
| data. Both files required at inference time and **must keep their |
| exact filenames** — the `.onnx` file embeds a reference to the |
| `.data` sidecar by name. Place in the same directory before loading. |
| - `nova_v3b.pt` — PyTorch checkpoint (weights only) for research |
| and fine-tuning. |
| - `unified_sample_600k.pkl` — the 600K-position out-of-sample |
| evaluation set used in the results reported below. Schema: |
| `{fen, actual, rating, ply, min_clock, piece_count, band, |
| player_id, result, ...}`. |
| - `nova_neutral_600k.pkl`, `nova_actual_600k.pkl`, |
| `maia2_600k.pkl`, `maia3_600k.pkl` — per-position predictions from |
| each model on the 600K sample, used for the paired significance |
| tests in `RESULTS.md`. |
| |
| |
| ## Uses |
| |
| ### Direct use |
| |
| - Predict the probability distribution over legal moves that a human |
| of a specified rating and style would play from a given position. |
| - Sample moves to run as a human-like opponent or study partner. |
| - Score actual human moves by `P(actual_move | position, rating, |
| style)` for humanness analysis, move difficulty assessment, or |
| anti-cheat signals. |
| - Benchmark other human-move predictors against Nova on shared |
| evaluation sets. |
| - Fine-tune on specialized data (specific player corpora, specific |
| opening systems, specific time controls) for personal or research |
| use. |
|
|
| ### Downstream use |
|
|
| The model is used internally by Nova Chess to power the |
| *Play Nova* and *Train with Nova* features at https://novachess.ai, |
| where end users play or practice against the model at chosen rating |
| and style settings. The same weights published in this repository are |
| the ones served by the application. |
|
|
| The in-app version additionally wraps Nova's policy output with a |
| small calibration layer used to tune playing strength across rating |
| tiers. The primary lever is a **per-tier temperature schedule**. On |
| top of that, Nova's sampled candidates pass through a probabilistic |
| quality check: high-confidence picks (where Nova's policy concentrates |
| significant probability mass on a single move) are played directly, |
| while lower-confidence picks may be sent to Stockfish for a low-depth |
| evaluation. If the evaluation falls below a tier-dependent quality |
| threshold, the move *may* be replaced by re-sampling from Nova's |
| distribution. Both the rate at which positions are evaluated and the |
| rate at which sub-threshold moves are actually replaced vary by tier, |
| so the bot's mistake profile matches the empirical chess.com CP-loss |
| profile at that level — at lower tiers, more sub-optimal moves slip |
| through (because players at that level make them); at higher tiers, |
| far fewer do. Additional calibration components layer on top of this |
| base flow. |
|
|
| **Every move the in-app bot plays still originates from Nova's policy |
| distribution.** Stockfish is never used to suggest, generate, or |
| select moves — only to evaluate moves Nova has already proposed, so |
| that obvious blunders at higher tiers can be probabilistically caught. |
| The model weights are never touched. The calibration layer is **not** |
| part of this release; the released checkpoint is the bare policy |
| model, exactly the surface that benchmarks and downstream research / |
| fine-tuning should target. See the README's "In-app behavior vs the |
| released model" section for the full distinction. |
|
|
| ### Out-of-scope use |
|
|
| - Not suitable as a chess-playing engine for maximum-strength |
| competition against search-based engines. Nova is trained on |
| human-move prediction — it maximizes `P(move | human of rating R)`, |
| not `P(best move)` — and uses no search, no lookahead, and no |
| position evaluation beyond the single forward pass of the policy |
| network. For engine-quality play, a search-based evaluator such as |
| Stockfish remains the correct choice. |
| - Not intended for cheating detection as a standalone verdict. Nova |
| probabilities can inform a cheat-detection pipeline but should not |
| be used as sole evidence for accusations. |
| - Not validated on chess variants (Chess960, King of the Hill, etc.) |
| — trained only on standard chess. |
| - Not a replacement for human coaching — move probabilities are not |
| explanations, and the model does not produce commentary or verbal |
| analysis. |
|
|
|
|
| ## Bias, Risks, and Limitations |
|
|
| - **Training-data distribution.** Nova is trained on ~520M positions |
| from Lichess rapid games played Apr–Nov 2025. The player population |
| is self-selected (online rapid players on one platform), skews |
| toward active users in rating bands 1100–2300, and may not |
| represent the full distribution of human chess play. Inferences |
| about moves at extreme ratings (particularly below 800 and above |
| 2500) have less training-data support. |
| - **Style axis limitations.** The classical and aggression axes |
| capture specific operational definitions (opening move choices for |
| classical; captures + territorial control + king pressure for |
| aggression). They do not capture all dimensions of human chess |
| style (combinational richness, prophylaxis, time management, etc.). |
| - **Rating conditioning is a scalar.** Nova receives a single number |
| for rating, not a distribution. The model has learned a continuous |
| interpolation of playing strength, but at the high end of the |
| rating axis the playing strength it produces may saturate below the |
| conditioned rating. |
| - **No game history.** Nova conditions on the current position only, |
| not on the preceding move sequence. Two positions with identical |
| FENs are indistinguishable to the model even if reached through |
| very different games. |
| - **No check for illegal moves.** The raw logits include mass on |
| illegal move indices. Callers must apply a legal-move mask before |
| sampling. See the README quickstart and `docs/serving.md` for the |
| reference masking pattern. |
| - **Value / result prediction is not supported.** This checkpoint is |
| policy-only; it does not output win/draw/loss probabilities. |
|
|
|
|
| ## Training Details |
|
|
| ### Training data |
|
|
| Nova was trained on a large corpus of Lichess rapid games, balanced |
| across six rating bands from 800 to 2700+. Position sampling and |
| filtering were tuned to keep all skill levels and all game phases |
| well-represented in training. Details of the data pipeline and |
| cohort balancing are not published. |
|
|
| ### Training procedure |
|
|
| Nova is trained end-to-end with a cross-entropy objective over the |
| 16,384-index move space. The output is the policy distribution over |
| legal moves, with no auxiliary value head. Specific architectural |
| dimensions and training hyperparameters are not published. |
|
|
| ### Inference cost |
|
|
| - CPU (ONNX fp32): 35–50 ms per position on a modern x86 core |
| - GPU (batched, H100): ~1 ms per position |
| - Inference memory: approximately 500 MB RAM per worker (fp32 weights |
| with external-data sidecar) |
|
|
|
|
| ## Evaluation |
|
|
| ### Evaluation data |
|
|
| A single held-out evaluation sample of **600,000 positions** drawn |
| from Lichess rapid games played in **March 2026**, stratified at |
| 100,000 positions per rating band. This sample is temporally held out |
| from Nova's training data and is shipped as `unified_sample_600k.pkl` |
| on Hugging Face. |
|
|
| ### Metrics |
|
|
| - **hit1** — fraction of positions where the model's top prediction |
| matches the human's actual move (top-1 accuracy) |
| - **hit5** — fraction of positions where the human's move is in the |
| model's top-5 predictions |
| - **Mean P(actual)** — mean probability mass that the model assigned |
| to the move the human actually played |
| - **Mean top-5 mass** — mean total probability mass assigned to the |
| top-5 predictions |
|
|
| ### Results |
|
|
| On the 600,000-position sample, comparing Nova against the publicly |
| available Maia-3 checkpoint (`maia3_simplified.onnx` from |
| https://maiachess.com) and the Maia-2 rapid checkpoint: |
|
|
| | Metric | Maia-2 | Maia-3 | Nova (neutral style) | |
| |---|---|---|---| |
| | Top-1 hit rate | 50.27 % | **54.83 %** | 54.60 % | |
| | Top-5 hit rate | 88.38 % | **91.23 %** | 91.10 % | |
| | Mean P(actual) | 38.44 % | 42.10 % | **42.51 %** | |
| | Mean top-5 mass | 89.33 % | 91.96 % | **92.26 %** | |
|
|
| All four Nova-vs-Maia-3 deltas are statistically significant under |
| paired McNemar tests (for hit rates) and paired t-tests (for |
| probability-mass metrics). Both probability-mass deltas remain |
| significant under a player-clustered bootstrap (95% CIs reported in |
| RESULTS.md). |
|
|
| Full breakdown by rating band, Maia tier (Skilled / Advanced / |
| Master), game phase, piece count, and three filter variants (all |
| positions; `ply ≥ 10`; `ply ≥ 10 + clock ≥ 30 s`) is in |
| [`RESULTS.md`](RESULTS.md). |
|
|
|
|
| ## How to use |
|
|
| Minimum-dependency inference example (CPU): |
|
|
| ```bash |
| pip install onnxruntime python-chess numpy |
| ``` |
|
|
| ```python |
| import chess |
| import numpy as np |
| import onnxruntime as ort |
| |
| PIECE = {"P":0,"N":1,"B":2,"R":3,"Q":4,"K":5, |
| "p":6,"n":7,"b":8,"r":9,"q":10,"k":11} |
| |
| def fen_to_planes(fen): |
| planes = np.zeros((18, 8, 8), dtype=np.float32) |
| parts = fen.split() |
| board, turn, castling, ep = parts[0], parts[1], parts[2], parts[3] |
| for ri, rank_str in enumerate(board.split("/")): |
| rank_idx, file_idx = 7 - ri, 0 |
| for ch in rank_str: |
| if ch.isdigit(): |
| file_idx += int(ch) |
| else: |
| planes[PIECE[ch], rank_idx, file_idx] = 1.0 |
| file_idx += 1 |
| if turn == "w": planes[12].fill(1.0) |
| if "K" in castling: planes[13].fill(1.0) |
| if "Q" in castling: planes[14].fill(1.0) |
| if "k" in castling: planes[15].fill(1.0) |
| if "q" in castling: planes[16].fill(1.0) |
| if ep != "-" and len(ep) == 2: |
| planes[17, 0, ord(ep[0]) - ord("a")] = 1.0 |
| return planes |
| |
| session = ort.InferenceSession("nova_v3b.onnx", |
| providers=["CPUExecutionProvider"]) |
| |
| board = chess.Board() |
| positions = fen_to_planes(board.fen())[np.newaxis] |
| # rating=1600, neutral classical + aggression |
| conditioning = np.array([[(1600 - 800) / (2700 - 800), 0.5, 0.5]], |
| dtype=np.float32) |
| logits = session.run(None, {"positions": positions, |
| "conditioning": conditioning})[0][0] |
| |
| # Mask illegals + softmax |
| legal = np.zeros(16384, dtype=bool) |
| for mv in board.legal_moves: |
| idx = mv.from_square * 64 + mv.to_square |
| if mv.promotion == chess.KNIGHT: idx += 4096 |
| elif mv.promotion == chess.BISHOP: idx += 4096 * 2 |
| elif mv.promotion == chess.ROOK: idx += 4096 * 3 |
| legal[idx] = True |
| masked = np.where(legal, logits, -1e9) |
| probs = np.exp(masked - masked.max()) |
| probs *= legal |
| probs /= probs.sum() |
| |
| top = np.argsort(probs)[::-1][:5] |
| for i in top: |
| print(f" index {int(i):5d} p = {probs[i]*100:.2f}%") |
| ``` |
|
|
| For production deployment notes (multi-worker setup, rate limiting, |
| temperature schedules, observability), see `docs/serving.md` in the |
| GitHub repository. |
|
|
|
|
| ## Citation |
|
|
| ``` |
| Nova Chess Engine. Nova Chess, 2026. |
| https://github.com/novachessai/novachess-engine |
| https://huggingface.co/novachess/novachess-engine |
| ``` |
|
|
| BibTeX: |
|
|
| ```bibtex |
| @misc{novachess_2026, |
| title = {Nova Chess Engine}, |
| author = {Nova Chess}, |
| year = {2026}, |
| url = {https://github.com/novachessai/novachess-engine} |
| } |
| ``` |
|
|
|
|
| ## Acknowledgments |
|
|
| Nova builds on prior work in human-move prediction. The evaluation |
| methodology (rating-band stratification, tier definitions, ply-based |
| filters, Lichess rapid data) follows conventions established by the |
| Maia project. |
|
|
| - Maia-1 — McIlroy-Young, Sen, Kleinberg & Anderson, *Aligning |
| Superhuman AI with Human Behavior: Chess as a Model System*, |
| KDD 2020. [arXiv:2006.01855](https://arxiv.org/abs/2006.01855) |
| - Maia-2 — Tang, Jiao, McIlroy-Young, Kleinberg, Sen & Anderson, |
| *Maia-2: A Unified Model for Human-AI Alignment in Chess*, |
| NeurIPS 2024. [arXiv:2409.20553](https://arxiv.org/abs/2409.20553) |
| - Maia-3 — Maia project, https://maiachess.com. The specific |
| checkpoint evaluated here is `maia3_simplified.onnx` published there. |
| - Allie — Khoshneviszadeh, Chi, Sheller et al., *Allie: Emergent |
| Human-Like Play Through Adaptive MCTS with a Decoder-Only |
| Transformer*, ICLR 2025. |
| [arXiv:2410.03893](https://arxiv.org/abs/2410.03893) |
|
|
|
|
| ## Contact |
|
|
| - Product website: <https://novachess.ai> |
| - GitHub issues: <https://github.com/novachessai/novachess-engine/issues> |
| - Commercial licensing inquiries: support@novachess.ai |
|
|