--- license: apache-2.0 language: - en - zh base_model: Qwen/Qwen3-VL-8B-Thinking tags: - physics - reasoning - multimodal - rl - grpo - arxiv:2605.14040 pipeline_tag: image-text-to-text library_name: transformers --- # Physics-R1 — Seed 23 (HF safetensors) [**Project Page**](https://shanyang.me/physics-r1-page/) | [**Paper**](https://huggingface.co/papers/2605.14040) | [**Code**](https://github.com/shanyang-me/physics-r1-neurips2026) | [**Training corpus**](https://huggingface.co/datasets/shanyangmie/physr1corp) The Physics-R1 paper checkpoint for the **seed-23 row of Table 2** (canonical step 60). Fine-tune of `Qwen3-VL-8B-Thinking` on the audited [`PhysR1Corp`](https://huggingface.co/datasets/shanyangmie/physr1corp) (2,268 closed-form physics problems) via full-parameter FSDP1 GRPO with binary correctness reward. This seed has the **highest individual PhysOlym-A score (28.2)** among the paper's three seeds. **This is the easy-to-use HF safetensors release.** For the original verl FSDP-sharded archive, see [`physics-r1-seed23-canonical-step60-fsdp`](https://huggingface.co/shanyangmie/physics-r1-seed23-canonical-step60-fsdp). ## Quickstart ```python from transformers import AutoModelForImageTextToText, AutoProcessor import torch model = AutoModelForImageTextToText.from_pretrained( "shanyangmie/physics-r1-seed23", dtype=torch.bfloat16, device_map="auto", trust_remote_code=True, ) processor = AutoProcessor.from_pretrained( "shanyangmie/physics-r1-seed23", trust_remote_code=True, ) ``` For evaluation against the paper's benchmark, see [PhysOlym-A](https://huggingface.co/datasets/shanyangmie/physolym-a) and the [code release](https://github.com/shanyang-me/physics-r1-neurips2026). ## Performance (paper Table 2, seed-23 row) | Eval | Physics-R1 (this checkpoint) | Base Qwen3-VL-8B-Thinking | Δ | |---|---|---|---| | PhyX-mini | 77.9 | 73.7 | +4.2 | | PhyX-3k | 76.6 | 74.4 | +2.2 | | PhysReason | 43.4 | 23.9 | +19.5 | | PUB-OE | 30.9 | 35.3 | -4.4 | | OlympiadBench-Physics | 48.0 | 39.3 | +8.7 | | **PhysOlym-A** | **28.2** | 8.0 | **+20.2** | Scoring: problem-level liberal Sonnet-as-judge (every subpart of a multi-part problem must be correct). The 3-seed mean across {42, 17, 23} is the paper's headline (+18.9 pp on PhysOlym-A). ## Other seeds (HF safetensors mirrors) | Seed | HF safetensors mirror | FSDP archive | |---|---|---| | 42 | [`shanyangmie/physics-r1-seed42-v4-step60`](https://huggingface.co/shanyangmie/physics-r1-seed42-v4-step60) | [`...-seed42-v4-step60-fsdp`](https://huggingface.co/shanyangmie/physics-r1-seed42-v4-step60-fsdp) | | 17 | [`shanyangmie/physics-r1-seed17`](https://huggingface.co/shanyangmie/physics-r1-seed17) | [`...-seed17-canonical-step63-fsdp`](https://huggingface.co/shanyangmie/physics-r1-seed17-canonical-step63-fsdp) | | 23 | **this card** | [`...-seed23-canonical-step60-fsdp`](https://huggingface.co/shanyangmie/physics-r1-seed23-canonical-step60-fsdp) | ## Training recipe - **Base model**: [`Qwen/Qwen3-VL-8B-Thinking`](https://huggingface.co/Qwen/Qwen3-VL-8B-Thinking) - **Algorithm**: GRPO (verl 0.6.1, full-parameter FSDP1 — `actor.strategy=fsdp`, *not* `fsdp2`) - **Reward**: binary correctness, per-subpart Sonnet judge with problem-level AND aggregation (see paper §3.2) - **Data**: [`shanyangmie/physr1corp`](https://huggingface.co/datasets/shanyangmie/physr1corp) — 2,268 audited closed-form problems - **Seed / step**: 23 / 60 - **Hardware**: 4×H200 (FSDP1 4-way sharded) Full hyperparameters in paper Appendix. ## License Apache 2.0, inheriting from the base model [`Qwen3-VL-8B-Thinking`](https://huggingface.co/Qwen/Qwen3-VL-8B-Thinking). Training data (`physr1corp`) is CC BY-NC 4.0, so this derivative checkpoint is intended for **non-commercial research use**. ## Citation ```bibtex @misc{yang2026physicsr1, title = {Physics-R1: An Audited Olympiad Corpus and Recipe for Visual Physics Reasoning}, author = {Yang, Shan}, year = {2026}, url = {https://huggingface.co/papers/2605.14040} } ```