chess-sft-qwen3-0.6b

chess-sft-qwen3-0.6b is an intermediate research model designed to develop chess move generation with reasoning capabilities. It is built on top of Qwen3-0.6B, and has been fully fine-tuned using supervised fine-tuning (SFT) on a curated dataset of 776,026 chess move legality examples. This model serves as a foundation for training models that can generate the best moves with chain-of-thought reasoning.

Model Specifications

  • Base Model: Qwen3-0.6B
  • Fine-tuning Method: Full parameter fine-tuning (SFT)
  • Training Framework: TRL (Transformer Reinforcement Learning)
  • Optimization: Unsloth for accelerated training
  • Model Size: ~0.6B parameters
  • Architecture: Qwen3 (Causal Language Model)

Dataset

Each example contains:

  • Board Position: Full description of chess piece placement using an list format
  • Query: A question about move legality (e.g., "Is it legal for the white bishop at c4 to move to (or capture) f7?")
  • Answer: Binary response ("yes" or "no")
  • FEN Notation: The Forsyth-Edwards Notation representation of the position
  • Metadata:
    • Piece type, color, and squares involved
    • Move legality status
    • Move category (capture, check, checkmate, etc.)
    • Special move indicators

Data Format

Each training example follows this structure:

{
  "prompt": "Consider the position below and answer the query:\n[position]\n\nQuery: Is it legal for the white bishop at c4 to move to (or capture) f7? Answer only yes or no",
  "output": "<think>\n[optional reasoning]\n</think>\n\nyes",
  "fen": "rnbq1rk1/ppp1ppbp/5np1/6B1/2BP4/2N2N2/PPP2PPP/R2QK2R w KQ - 4 8",
  "metadata": {
    "piece_type": "bishop",
    "piece_color": "white",
    "from_square": "c4",
    "to_square": "f7",
    "is_legal": true,
    "category": "legal_capture",
    "is_check": false,
    "is_checkmate": false
  }
}

Data Source

The dataset covers:

  • Various chess positions from real games
  • Different piece types and move types
  • Both legal and illegal moves
  • Special moves (castling, en passant, pawn promotion)
  • Edge cases and complex positions

Training Procedure

Code

The full training code can be found at https://github.com/navgeet/chess-sft

Training Configuration

  • Training Algorithm: Supervised Fine-Tuning (SFT)
  • Number of Epochs: 3
  • Batch Size: 64
  • Gradient Accumulation Steps: 2
  • Learning Rate: 2e-4
  • Max Sequence Length: 1024 tokens
  • Weight Decay: 0.001
  • Learning Rate Scheduler: Linear
  • Warmup Steps: 100
  • Optimization: paged_adamw_8bit
  • Precision: bfloat16
  • Gradient Clipping: max_norm=1.0
  • Checkpoint Strategy: Save every 500 steps, keep 3 most recent

Chat Template

The model uses a standard chat template that formats examples as:

<|im_start|>user
[prompt]
<|im_end|>
<|im_start|>assistant
[output]
<|im_end|>

Training Notes

  • Training was performed on an RX 9070 XT with the Unsloth optimization framework for efficient full-parameter fine-tuning
  • Total training involved 3 epochs over 776,026 examples

Citation

If you use this model, please cite:

@misc{qwen2024chess-sft,
  title={Chess Move Legality Model based on Qwen3-0.6B},
  author={Navgeet Agarwal},
  year={2025}
}

Also cite the base model and frameworks used:

@misc{qwen2024qwen3,
  title={Qwen3 Language Model},
  author={Alibaba Research},
  year={2024}
}

@misc{vonwerra2022trl,
  title={{TRL: Transformer Reinforcement Learning}},
  author={Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
  year={2020},
  journal={GitHub repository},
  publisher={GitHub},
  howpublished={\url{https://github.com/huggingface/trl}}
}

License

MIT

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for navgeet/chess-sft-merged

Finetuned
Qwen/Qwen3-0.6B
Finetuned
(221)
this model