chess-sft-qwen3-0.6b

chess-sft-qwen3-0.6b is an intermediate research model designed to develop chess move generation with reasoning capabilities. It is built on top of Qwen3-0.6B, and has been fully fine-tuned using supervised fine-tuning (SFT) on a curated dataset of 776,026 chess move legality examples. This model serves as a foundation for training models that can generate the best moves with chain-of-thought reasoning.

Model Specifications

Base Model: Qwen3-0.6B
Fine-tuning Method: Full parameter fine-tuning (SFT)
Training Framework: TRL (Transformer Reinforcement Learning)
Optimization: Unsloth for accelerated training
Model Size: ~0.6B parameters
Architecture: Qwen3 (Causal Language Model)

Dataset

Each example contains:

Board Position: Full description of chess piece placement using an list format
Query: A question about move legality (e.g., "Is it legal for the white bishop at c4 to move to (or capture) f7?")
Answer: Binary response ("yes" or "no")
FEN Notation: The Forsyth-Edwards Notation representation of the position
Metadata:
- Piece type, color, and squares involved
- Move legality status
- Move category (capture, check, checkmate, etc.)
- Special move indicators

Data Format

Each training example follows this structure:

{
  "prompt": "Consider the position below and answer the query:\n[position]\n\nQuery: Is it legal for the white bishop at c4 to move to (or capture) f7? Answer only yes or no",
  "output": "<think>\n[optional reasoning]\n</think>\n\nyes",
  "fen": "rnbq1rk1/ppp1ppbp/5np1/6B1/2BP4/2N2N2/PPP2PPP/R2QK2R w KQ - 4 8",
  "metadata": {
    "piece_type": "bishop",
    "piece_color": "white",
    "from_square": "c4",
    "to_square": "f7",
    "is_legal": true,
    "category": "legal_capture",
    "is_check": false,
    "is_checkmate": false
  }
}

Data Source

The dataset covers:

Various chess positions from real games
Different piece types and move types
Both legal and illegal moves
Special moves (castling, en passant, pawn promotion)
Edge cases and complex positions

Training Procedure

Code

The full training code can be found at https://github.com/navgeet/chess-sft

Training Configuration

Training Algorithm: Supervised Fine-Tuning (SFT)
Number of Epochs: 3
Batch Size: 64
Gradient Accumulation Steps: 2
Learning Rate: 2e-4
Max Sequence Length: 1024 tokens
Weight Decay: 0.001
Learning Rate Scheduler: Linear
Warmup Steps: 100
Optimization: paged_adamw_8bit
Precision: bfloat16
Gradient Clipping: max_norm=1.0
Checkpoint Strategy: Save every 500 steps, keep 3 most recent

Chat Template

The model uses a standard chat template that formats examples as:

<|im_start|>user
[prompt]
<|im_end|>
<|im_start|>assistant
[output]
<|im_end|>

Training Notes

Training was performed on an RX 9070 XT with the Unsloth optimization framework for efficient full-parameter fine-tuning
Total training involved 3 epochs over 776,026 examples

Citation

If you use this model, please cite:

@misc{qwen2024chess-sft,
  title={Chess Move Legality Model based on Qwen3-0.6B},
  author={Navgeet Agarwal},
  year={2025}
}

Also cite the base model and frameworks used:

@misc{qwen2024qwen3,
  title={Qwen3 Language Model},
  author={Alibaba Research},
  year={2024}
}

@misc{vonwerra2022trl,
  title={{TRL: Transformer Reinforcement Learning}},
  author={Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
  year={2020},
  journal={GitHub repository},
  publisher={GitHub},
  howpublished={\url{https://github.com/huggingface/trl}}
}

License

MIT

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for navgeet/chess-sft-merged

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-0.6B

Finetuned

unsloth/Qwen3-0.6B

Finetuned

(221)

this model