Upload README.md with huggingface_hub

39de76a verified 8 days ago

17.3 kB

	---
	license: other
	license_name: nova-chess-engine-license
	license_link: https://github.com/novachessai/novachess-engine/blob/main/LICENSE
	language: en
	tags:
	- chess
	- transformer
	- human-move-prediction
	- style-conditioned
	pipeline_tag: other
	library_name: onnx
	---

	# Nova Chess Engine

	A style-conditioned transformer that predicts human chess moves. Given
	a board position, a target player rating, and two optional style parameters
	(classical–hypermodern preference and aggression), Nova returns a
	probability distribution over all legal moves calibrated to how a
	player of that rating and style would play.

	Inference is a single forward pass of the neural network. Nova
	does not use Monte Carlo tree search, minimax, alpha-beta pruning,
	or any form of engine-style position evaluation. There is no value
	head and no lookahead. Move selection comes entirely from learned
	patterns over the training corpus — fast on CPU (~35-50 ms per
	position) and categorically different from search-based engines
	like Stockfish or Leela.

	Play Nova directly at [novachess.ai](https://novachess.ai) —
	Nova powers the Play and Train with Nova features on the site,
	where you can face Nova at any rating/style setting with built-in
	analysis and post-game review.

	- Developed by Nova Chess — <https://novachess.ai>
	- Model type: pure-policy neural network over chess moves,
	conditioned on position + rating + two style parameters.
	Single forward pass per position; no search, no value head, no
	game history.
	- Language(s): not applicable — input is chess positions (18-channel
	plane encoding), output is a distribution over 16,384 move indices
	- License: custom non-commercial (see `LICENSE`)

	Full results and reproducibility: [`RESULTS.md`](RESULTS.md).
	Source code and docs: <https://github.com/novachessai/novachess-engine>


	## Model Details

	### Inputs

	- `positions` — `float32` tensor of shape `(B, 18, 8, 8)`
	- Planes 0–5: white pieces (P, N, B, R, Q, K), one-hot per square
	- Planes 6–11: black pieces (P, N, B, R, Q, K)
	- Plane 12: side to move (1s if white to move, 0s otherwise)
	- Planes 13–16: castling rights (white-kingside, white-queenside,
	black-kingside, black-queenside)
	- Plane 17: en-passant file indicator
	- `conditioning` — `float32` tensor of shape `(B, 3)`
	- `rating_norm` = `(rating − 800) / (2700 − 800)`, clipped to `[0, 1]`
	- `classical` ∈ `[0, 1]` — opening preference (higher = more
	classical mainlines)
	- `aggression` ∈ `[0, 1]` — tactical/sacrificial tendency

	### Outputs

	- `logits` — `float32` tensor of shape `(B, 16384)`. Raw logits over
	the 16,384-index move space. Caller is responsible for masking
	illegal moves and applying softmax.

	Move index encoding:

	```
	move_index = promotion_offset + from_square * 64 + to_square
	promotion_offset:
	0 no promotion (also queen promotion)
	4096 knight promotion
	8192 bishop promotion
	12288 rook promotion
	```

	where `from_square` and `to_square` are standard 0–63 indices
	(`a1 = 0, h8 = 63`).

	### Architectural distinction from prior work

	Nova is single-head pure-policy: the network's only output is the
	move distribution. There is no value head (no game-outcome
	prediction), no auxiliary head (no side-task supervision on captures,
	checks, etc.), no search at inference, no lookahead, and no use of
	any position evaluator before, during, or after the forward pass.
	Move selection comes entirely from the policy distribution the
	network learned by predicting actual human moves at the conditioned
	rating and style.

	This contrasts with every published comparable model:

	\| Model \| Heads at inference \| Search \| Notes \|
	\|---\|---\|---\|---\|
	\| Nova (this release) \| 1 (policy) \| none \| single forward pass, ~35–50 ms CPU \|
	\| Maia-2 (NeurIPS 2024) \| 3 (policy + value + auxiliary) \| none \| value head regresses W/D/L; auxiliary head predicts legal moves, captures, check delivery \|
	\| Maia-3 (`maia3_simplified.onnx`) \| 2 (policy + value) \| none \| drops Maia-2's auxiliary head; retains W/D/L value head \|
	\| Allie (ICLR 2025) \| 1 (policy) + value via search \| adaptive MCTS at inference \| policy is decoder-only over move sequences; MCTS provides per-position evaluation at runtime \|
	\| Leela (LC0) \| 2 (policy + value) \| MCTS \| engine-strength playing model \|
	\| Stockfish (NNUE) \| evaluation only \| alpha-beta \| not a human-move predictor \|

	The pure-policy stance is a deliberate design choice. It keeps the
	model fast (one forward pass, no tree expansion), simple to deploy
	(just the ONNX file — no MCTS implementation, no auxiliary supervision
	data at training time, no value-head calibration to maintain), and
	forces the network to learn move quality entirely from move-selection
	patterns rather than offloading it to a parallel evaluator. The
	benchmarks in `RESULTS.md` show this is competitive with multi-head
	architectures on the move-prediction task itself.


	### Files in this repository

	- `nova_v3b.onnx` + `nova_v3b.onnx.data` — ONNX export with external
	data. Both files required at inference time and **must keep their
	exact filenames** — the `.onnx` file embeds a reference to the
	`.data` sidecar by name. Place in the same directory before loading.
	- `nova_v3b.pt` — PyTorch checkpoint (weights only) for research
	and fine-tuning.
	- `unified_sample_600k.pkl` — the 600K-position out-of-sample
	evaluation set used in the results reported below. Schema:
	`{fen, actual, rating, ply, min_clock, piece_count, band,
	player_id, result, ...}`.
	- `nova_neutral_600k.pkl`, `nova_actual_600k.pkl`,
	`maia2_600k.pkl`, `maia3_600k.pkl` — per-position predictions from
	each model on the 600K sample, used for the paired significance
	tests in `RESULTS.md`.


	## Uses

	### Direct use

	- Predict the probability distribution over legal moves that a human
	of a specified rating and style would play from a given position.
	- Sample moves to run as a human-like opponent or study partner.
	- Score actual human moves by `P(actual_move \| position, rating,
	style)` for humanness analysis, move difficulty assessment, or
	anti-cheat signals.
	- Benchmark other human-move predictors against Nova on shared
	evaluation sets.
	- Fine-tune on specialized data (specific player corpora, specific
	opening systems, specific time controls) for personal or research
	use.

	### Downstream use

	The model is used internally by Nova Chess to power the
	Play Nova and Train with Nova features at https://novachess.ai,
	where end users play or practice against the model at chosen rating
	and style settings. The same weights published in this repository are
	the ones served by the application.

	The in-app version additionally wraps Nova's policy output with a
	small calibration layer used to tune playing strength across rating
	tiers. The primary lever is a per-tier temperature schedule. On
	top of that, Nova's sampled candidates pass through a probabilistic
	quality check: high-confidence picks (where Nova's policy concentrates
	significant probability mass on a single move) are played directly,
	while lower-confidence picks may be sent to Stockfish for a low-depth
	evaluation. If the evaluation falls below a tier-dependent quality
	threshold, the move may be replaced by re-sampling from Nova's
	distribution. Both the rate at which positions are evaluated and the
	rate at which sub-threshold moves are actually replaced vary by tier,
	so the bot's mistake profile matches the empirical chess.com CP-loss
	profile at that level — at lower tiers, more sub-optimal moves slip
	through (because players at that level make them); at higher tiers,
	far fewer do. Additional calibration components layer on top of this
	base flow.

	**Every move the in-app bot plays still originates from Nova's policy
	distribution.** Stockfish is never used to suggest, generate, or
	select moves — only to evaluate moves Nova has already proposed, so
	that obvious blunders at higher tiers can be probabilistically caught.
	The model weights are never touched. The calibration layer is not
	part of this release; the released checkpoint is the bare policy
	model, exactly the surface that benchmarks and downstream research /
	fine-tuning should target. See the README's "In-app behavior vs the
	released model" section for the full distinction.

	### Out-of-scope use

	- Not suitable as a chess-playing engine for maximum-strength
	competition against search-based engines. Nova is trained on
	human-move prediction — it maximizes `P(move \| human of rating R)`,
	not `P(best move)` — and uses no search, no lookahead, and no
	position evaluation beyond the single forward pass of the policy
	network. For engine-quality play, a search-based evaluator such as
	Stockfish remains the correct choice.
	- Not intended for cheating detection as a standalone verdict. Nova
	probabilities can inform a cheat-detection pipeline but should not
	be used as sole evidence for accusations.
	- Not validated on chess variants (Chess960, King of the Hill, etc.)
	— trained only on standard chess.
	- Not a replacement for human coaching — move probabilities are not
	explanations, and the model does not produce commentary or verbal
	analysis.


	## Bias, Risks, and Limitations

	- Training-data distribution. Nova is trained on ~520M positions
	from Lichess rapid games played Apr–Nov 2025. The player population
	is self-selected (online rapid players on one platform), skews
	toward active users in rating bands 1100–2300, and may not
	represent the full distribution of human chess play. Inferences
	about moves at extreme ratings (particularly below 800 and above
	2500) have less training-data support.
	- Style axis limitations. The classical and aggression axes
	capture specific operational definitions (opening move choices for
	classical; captures + territorial control + king pressure for
	aggression). They do not capture all dimensions of human chess
	style (combinational richness, prophylaxis, time management, etc.).
	- Rating conditioning is a scalar. Nova receives a single number
	for rating, not a distribution. The model has learned a continuous
	interpolation of playing strength, but at the high end of the
	rating axis the playing strength it produces may saturate below the
	conditioned rating.
	- No game history. Nova conditions on the current position only,
	not on the preceding move sequence. Two positions with identical
	FENs are indistinguishable to the model even if reached through
	very different games.
	- No check for illegal moves. The raw logits include mass on
	illegal move indices. Callers must apply a legal-move mask before
	sampling. See the README quickstart and `docs/serving.md` for the
	reference masking pattern.
	- Value / result prediction is not supported. This checkpoint is
	policy-only; it does not output win/draw/loss probabilities.


	## Training Details

	### Training data

	Nova was trained on a large corpus of Lichess rapid games, balanced
	across six rating bands from 800 to 2700+. Position sampling and
	filtering were tuned to keep all skill levels and all game phases
	well-represented in training. Details of the data pipeline and
	cohort balancing are not published.

	### Training procedure

	Nova is trained end-to-end with a cross-entropy objective over the
	16,384-index move space. The output is the policy distribution over
	legal moves, with no auxiliary value head. Specific architectural
	dimensions and training hyperparameters are not published.

	### Inference cost

	- CPU (ONNX fp32): 35–50 ms per position on a modern x86 core
	- GPU (batched, H100): ~1 ms per position
	- Inference memory: approximately 500 MB RAM per worker (fp32 weights
	with external-data sidecar)


	## Evaluation

	### Evaluation data

	A single held-out evaluation sample of 600,000 positions drawn
	from Lichess rapid games played in March 2026, stratified at
	100,000 positions per rating band. This sample is temporally held out
	from Nova's training data and is shipped as `unified_sample_600k.pkl`
	on Hugging Face.

	### Metrics

	- hit1 — fraction of positions where the model's top prediction
	matches the human's actual move (top-1 accuracy)
	- hit5 — fraction of positions where the human's move is in the
	model's top-5 predictions
	- Mean P(actual) — mean probability mass that the model assigned
	to the move the human actually played
	- Mean top-5 mass — mean total probability mass assigned to the
	top-5 predictions

	### Results

	On the 600,000-position sample, comparing Nova against the publicly
	available Maia-3 checkpoint (`maia3_simplified.onnx` from
	https://maiachess.com) and the Maia-2 rapid checkpoint:

	\| Metric \| Maia-2 \| Maia-3 \| Nova (neutral style) \|
	\|---\|---\|---\|---\|
	\| Top-1 hit rate \| 50.27 % \| 54.83 % \| 54.60 % \|
	\| Top-5 hit rate \| 88.38 % \| 91.23 % \| 91.10 % \|
	\| Mean P(actual) \| 38.44 % \| 42.10 % \| 42.51 % \|
	\| Mean top-5 mass \| 89.33 % \| 91.96 % \| 92.26 % \|

	All four Nova-vs-Maia-3 deltas are statistically significant under
	paired McNemar tests (for hit rates) and paired t-tests (for
	probability-mass metrics). Both probability-mass deltas remain
	significant under a player-clustered bootstrap (95% CIs reported in
	RESULTS.md).

	Full breakdown by rating band, Maia tier (Skilled / Advanced /
	Master), game phase, piece count, and three filter variants (all
	positions; `ply ≥ 10`; `ply ≥ 10 + clock ≥ 30 s`) is in
	[`RESULTS.md`](RESULTS.md).


	## How to use

	Minimum-dependency inference example (CPU):

	```bash
	pip install onnxruntime python-chess numpy
	```

	```python
	import chess
	import numpy as np
	import onnxruntime as ort

	PIECE = {"P":0,"N":1,"B":2,"R":3,"Q":4,"K":5,
	"p":6,"n":7,"b":8,"r":9,"q":10,"k":11}

	def fen_to_planes(fen):
	planes = np.zeros((18, 8, 8), dtype=np.float32)
	parts = fen.split()
	board, turn, castling, ep = parts[0], parts[1], parts[2], parts[3]
	for ri, rank_str in enumerate(board.split("/")):
	rank_idx, file_idx = 7 - ri, 0
	for ch in rank_str:
	if ch.isdigit():
	file_idx += int(ch)
	else:
	planes[PIECE[ch], rank_idx, file_idx] = 1.0
	file_idx += 1
	if turn == "w": planes[12].fill(1.0)
	if "K" in castling: planes[13].fill(1.0)
	if "Q" in castling: planes[14].fill(1.0)
	if "k" in castling: planes[15].fill(1.0)
	if "q" in castling: planes[16].fill(1.0)
	if ep != "-" and len(ep) == 2:
	planes[17, 0, ord(ep[0]) - ord("a")] = 1.0
	return planes

	session = ort.InferenceSession("nova_v3b.onnx",
	providers=["CPUExecutionProvider"])

	board = chess.Board()
	positions = fen_to_planes(board.fen())[np.newaxis]
	# rating=1600, neutral classical + aggression
	conditioning = np.array([[(1600 - 800) / (2700 - 800), 0.5, 0.5]],
	dtype=np.float32)
	logits = session.run(None, {"positions": positions,
	"conditioning": conditioning})[0][0]

	# Mask illegals + softmax
	legal = np.zeros(16384, dtype=bool)
	for mv in board.legal_moves:
	idx = mv.from_square * 64 + mv.to_square
	if mv.promotion == chess.KNIGHT: idx += 4096
	elif mv.promotion == chess.BISHOP: idx += 4096 * 2
	elif mv.promotion == chess.ROOK: idx += 4096 * 3
	legal[idx] = True
	masked = np.where(legal, logits, -1e9)
	probs = np.exp(masked - masked.max())
	probs *= legal
	probs /= probs.sum()

	top = np.argsort(probs)[::-1][:5]
	for i in top:
	print(f" index {int(i):5d} p = {probs[i]*100:.2f}%")
	```

	For production deployment notes (multi-worker setup, rate limiting,
	temperature schedules, observability), see `docs/serving.md` in the
	GitHub repository.


	## Citation

	```
	Nova Chess Engine. Nova Chess, 2026.
	https://github.com/novachessai/novachess-engine
	https://huggingface.co/novachess/novachess-engine
	```

	BibTeX:

	```bibtex
	@misc{novachess_2026,
	title = {Nova Chess Engine},
	author = {Nova Chess},
	year = {2026},
	url = {https://github.com/novachessai/novachess-engine}
	}
	```


	## Acknowledgments

	Nova builds on prior work in human-move prediction. The evaluation
	methodology (rating-band stratification, tier definitions, ply-based
	filters, Lichess rapid data) follows conventions established by the
	Maia project.

	- Maia-1 — McIlroy-Young, Sen, Kleinberg & Anderson, *Aligning
	Superhuman AI with Human Behavior: Chess as a Model System*,
	KDD 2020. [arXiv:2006.01855](https://arxiv.org/abs/2006.01855)
	- Maia-2 — Tang, Jiao, McIlroy-Young, Kleinberg, Sen & Anderson,
	Maia-2: A Unified Model for Human-AI Alignment in Chess,
	NeurIPS 2024. [arXiv:2409.20553](https://arxiv.org/abs/2409.20553)
	- Maia-3 — Maia project, https://maiachess.com. The specific
	checkpoint evaluated here is `maia3_simplified.onnx` published there.
	- Allie — Khoshneviszadeh, Chi, Sheller et al., *Allie: Emergent
	Human-Like Play Through Adaptive MCTS with a Decoder-Only
	Transformer*, ICLR 2025.
	[arXiv:2410.03893](https://arxiv.org/abs/2410.03893)


	## Contact

	- Product website: <https://novachess.ai>
	- GitHub issues: <https://github.com/novachessai/novachess-engine/issues>
	- Commercial licensing inquiries: support@novachess.ai