| --- |
| license: mit |
| language: |
| - en |
| pipeline_tag: text-generation |
| tags: |
| - chess |
| - puzzles |
| - chess-games |
| - stockfish |
| - fen |
| - best-move |
| - uci |
| - san |
| - text-generation-inference |
| datasets: |
| - ethanjtang/GAMBIT-stockfish18-selfplay |
| - ethanjtang/GAMBIT-lichess-puzzle-positions |
| --- |
| |
| # GAMBIT: <ins>G</ins>ener<ins>a</ins>lization or <ins>M</ins>emorization? <ins>B</ins>r<ins>i</ins>ttleness <ins>T</ins>esting for Chess-Trained Language Models |
|
|
| [](https://arxiv.org/abs/2605.17565) <br> |
| [](https://github.com/ethanjtang/KinGPT) <br> |
| [](https://huggingface.co/datasets/ethanjtang/GAMBIT-lichess-puzzle-positions) <br> |
| [](https://huggingface.co/datasets/ethanjtang/GAMBIT-stockfish18-selfplay) <br> |
|
|
| ## Variants |
|
|
| ### KinGPT-Woodpecker |
|
|
| KinGPT variant trained on 13,341,057 unique puzzle positions (FEN + best move pairs). |
|
|
| Achieved `train loss 0.3590, val loss 0.3704` on puzzles corpus after training for ~500B tokens. |
|
|
| ### KinGPT-Beaver |
|
|
| KinGPT variant trained on 54,681 unique positions generated from 1050 Stockfish 18 self-play games. |
|
|
| Achieved `train loss 0.0974, val loss 1.7554` (overfitting due to small dataset size) on selfplay corpus after training for ~25B tokens. |
|
|
| ### KinGPT-Chimera |
|
|
| KinGPT variant trained on combined dataset of 13,395,738 Woodpecker and Beaver variant positions. |
|
|
| Achieved `train loss 0.3594, val loss 0.3710` on combined corpus after training for ~500B tokens. |
|
|
| ## Citation |
|
|
| ```bibtex |
| @misc{tang2026generalizationmemorizationbrittlenesstesting, |
| title={Generalization or Memorization? Brittleness Testing for Chess-Trained Language Models}, |
| author={Ethan Tang}, |
| year={2026}, |
| eprint={2605.17565}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.AI}, |
| url={https://arxiv.org/abs/2605.17565}, |
| } |
| ``` |