Using the trained model (Hugging Face)

This guide shows how to run the trained chess model from Hugging Face.

GitHub repo: https://github.com/KameniAlexNea/global-chess-challenge
Hugging Face model: https://huggingface.co/alexneakameni/Qwen2.5-Coder-0.5B-Instruct-chess-grpo

The model is trained to output:

<rationale>...</rationale>: a short explanation (or short PV line depending on the prompt)
<uci_move>...</uci_move>: the move to play in UCI format

1) Install (uv)

This repository uses uv with pyproject.toml.

uv sync

If you prefer regular pip:

pip install -U torch transformers accelerate python-chess

2) Quick inference with Transformers

This runs a single generation from a FEN.

uv run python - <<'PY'
import chess
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

from src.prompts import system_msg, user_msg
from src.tokenizer_utils import ensure_chat_template

MODEL_ID = "alexneakameni/Qwen2.5-Coder-0.5B-Instruct-chess-grpo"
FEN = "r2q1rk1/ppp2pbp/2np1np1/4P3/4PB2/2N2B2/PPPQ1PPP/2KR3R b - - 0 3"

# Tokenizer
# (Some models need fix_mistral_regex=True; it is safe to keep it here.)
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, fix_mistral_regex=True)
tokenizer = ensure_chat_template(tokenizer)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Model
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    torch_dtype=torch.bfloat16,
    device_map={"": 0} if torch.cuda.is_available() else None,
)

board = chess.Board(FEN)
side_to_move = "White" if board.turn == chess.WHITE else "Black"
legal_moves_uci = " ".join(m.uci() for m in board.legal_moves)

prompt = user_msg.format(
    FEN=board.fen(),
    side_to_move=side_to_move,
    legal_moves_uci=legal_moves_uci,
)

messages = [
    {"role": "system", "content": system_msg},
    {"role": "user", "content": prompt},
]

chat = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(chat, return_tensors="pt").to(model.device)

with torch.no_grad():
    out = model.generate(
        **inputs,
        max_new_tokens=64,
        do_sample=True,
        temperature=1.0,
        top_p=0.95,
        top_k=64,
        pad_token_id=tokenizer.pad_token_id,
        eos_token_id=tokenizer.eos_token_id,
    )

completion = tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(completion)
PY

Expected output contains <uci_move>...</uci_move>.

3) Run with vLLM + official local evaluation

The official starter kit evaluates an OpenAI-compatible endpoint.

Terminal 1: start vLLM server

cd global-chess-challenge-2025-starter-kit/player_agents
MODEL_NAME_OR_PATH="alexneakameni/Qwen2.5-Coder-0.5B-Instruct-chess-grpo" bash run_vllm.sh

This starts an OpenAI-compatible server at http://localhost:5000/v1 with served model name aicrowd-chess-model.

Terminal 2: run the evaluation harness

cd global-chess-challenge-2025-starter-kit
uv run python local_evaluation.py --endpoint http://localhost:5000/v1 \
  --template-file player_agents/qwen_prompt.jinja \
  --games-per-opponent 10

Notes:

The evaluation script parses the move from <uci_move>...</uci_move>.
If you want to use a locally-trained checkpoint instead of HF, set:
- MODEL_NAME_OR_PATH="../../models/chess-grpo-sequences" when launching vLLM.

4) Troubleshooting

Illegal moves: ensure your prompt requires selecting from the provided legal moves and that the model output includes the tags.
Out of memory (OOM): reduce --gpu-memory-utilization in global-chess-challenge-2025-starter-kit/player_agents/run_vllm.sh.
No GPU: the Transformers example can run on CPU, but will be slow.

Downloads last month: 9

Safetensors

Model size

2B params

Tensor type

BF16

Dataset used to train alexneakameni/Qwen2.5-Math-1.5B-Instruct-chess-grpo

Collection including alexneakameni/Qwen2.5-Math-1.5B-Instruct-chess-grpo

Global Chess Challenge

Collection

Collection of finetuned model for playing chess • 4 items • Updated Jan 10