--- license: mit language: - en pipeline_tag: text-generation tags: - chess - puzzles - chess-games - stockfish - fen - best-move - uci - san - text-generation-inference datasets: - ethanjtang/GAMBIT-stockfish18-selfplay - ethanjtang/GAMBIT-lichess-puzzle-positions --- # GAMBIT: Generalization or Memorization? Brittleness Testing for Chess-Trained Language Models [![arXiv](https://img.shields.io/badge/arXiv-2605.17565-b31b1b.svg?style=for-the-badge)](https://arxiv.org/abs/2605.17565)
[![GitHub](https://img.shields.io/badge/GitHub-KinGPT-black.svg?style=for-the-badge)](https://github.com/ethanjtang/KinGPT)
[![HuggingFace](https://img.shields.io/badge/🤗_HuggingFace-Puzzles-yellow?style=for-the-badge)](https://huggingface.co/datasets/ethanjtang/GAMBIT-lichess-puzzle-positions)
[![HuggingFace](https://img.shields.io/badge/🤗_HuggingFace-SF18%20Selfplay-yellow?style=for-the-badge)](https://huggingface.co/datasets/ethanjtang/GAMBIT-stockfish18-selfplay)
## Variants ### KinGPT-Woodpecker KinGPT variant trained on 13,341,057 unique puzzle positions (FEN + best move pairs). Achieved `train loss 0.3590, val loss 0.3704` on puzzles corpus after training for ~500B tokens. ### KinGPT-Beaver KinGPT variant trained on 54,681 unique positions generated from 1050 Stockfish 18 self-play games. Achieved `train loss 0.0974, val loss 1.7554` (overfitting due to small dataset size) on selfplay corpus after training for ~25B tokens. ### KinGPT-Chimera KinGPT variant trained on combined dataset of 13,395,738 Woodpecker and Beaver variant positions. Achieved `train loss 0.3594, val loss 0.3710` on combined corpus after training for ~500B tokens. ## Citation ```bibtex @misc{tang2026generalizationmemorizationbrittlenesstesting, title={Generalization or Memorization? Brittleness Testing for Chess-Trained Language Models}, author={Ethan Tang}, year={2026}, eprint={2605.17565}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2605.17565}, } ```