chess-sft-qwen3-0.6b
chess-sft-qwen3-0.6b is an intermediate research model designed to develop chess move generation with reasoning capabilities. It is built on top of Qwen3-0.6B, and has been fully fine-tuned using supervised fine-tuning (SFT) on a curated dataset of 776,026 chess move legality examples. This model serves as a foundation for training models that can generate the best moves with chain-of-thought reasoning.
Model Specifications
- Base Model: Qwen3-0.6B
- Fine-tuning Method: Full parameter fine-tuning (SFT)
- Training Framework: TRL (Transformer Reinforcement Learning)
- Optimization: Unsloth for accelerated training
- Model Size: ~0.6B parameters
- Architecture: Qwen3 (Causal Language Model)
Dataset
Each example contains:
- Board Position: Full description of chess piece placement using an list format
- Query: A question about move legality (e.g., "Is it legal for the white bishop at c4 to move to (or capture) f7?")
- Answer: Binary response ("yes" or "no")
- FEN Notation: The Forsyth-Edwards Notation representation of the position
- Metadata:
- Piece type, color, and squares involved
- Move legality status
- Move category (capture, check, checkmate, etc.)
- Special move indicators
Data Format
Each training example follows this structure:
{
"prompt": "Consider the position below and answer the query:\n[position]\n\nQuery: Is it legal for the white bishop at c4 to move to (or capture) f7? Answer only yes or no",
"output": "<think>\n[optional reasoning]\n</think>\n\nyes",
"fen": "rnbq1rk1/ppp1ppbp/5np1/6B1/2BP4/2N2N2/PPP2PPP/R2QK2R w KQ - 4 8",
"metadata": {
"piece_type": "bishop",
"piece_color": "white",
"from_square": "c4",
"to_square": "f7",
"is_legal": true,
"category": "legal_capture",
"is_check": false,
"is_checkmate": false
}
}
Data Source
The dataset covers:
- Various chess positions from real games
- Different piece types and move types
- Both legal and illegal moves
- Special moves (castling, en passant, pawn promotion)
- Edge cases and complex positions
Training Procedure
Code
The full training code can be found at https://github.com/navgeet/chess-sft
Training Configuration
- Training Algorithm: Supervised Fine-Tuning (SFT)
- Number of Epochs: 3
- Batch Size: 64
- Gradient Accumulation Steps: 2
- Learning Rate: 2e-4
- Max Sequence Length: 1024 tokens
- Weight Decay: 0.001
- Learning Rate Scheduler: Linear
- Warmup Steps: 100
- Optimization: paged_adamw_8bit
- Precision: bfloat16
- Gradient Clipping: max_norm=1.0
- Checkpoint Strategy: Save every 500 steps, keep 3 most recent
Chat Template
The model uses a standard chat template that formats examples as:
<|im_start|>user
[prompt]
<|im_end|>
<|im_start|>assistant
[output]
<|im_end|>
Training Notes
- Training was performed on an RX 9070 XT with the Unsloth optimization framework for efficient full-parameter fine-tuning
- Total training involved 3 epochs over 776,026 examples
Citation
If you use this model, please cite:
@misc{qwen2024chess-sft,
title={Chess Move Legality Model based on Qwen3-0.6B},
author={Navgeet Agarwal},
year={2025}
}
Also cite the base model and frameworks used:
@misc{qwen2024qwen3,
title={Qwen3 Language Model},
author={Alibaba Research},
year={2024}
}
@misc{vonwerra2022trl,
title={{TRL: Transformer Reinforcement Learning}},
author={Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
year={2020},
journal={GitHub repository},
publisher={GitHub},
howpublished={\url{https://github.com/huggingface/trl}}
}
License
MIT
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support