---
license: apache-2.0
language:
- en
- zh
base_model: Qwen/Qwen3-VL-8B-Thinking
tags:
- physics
- reasoning
- multimodal
- rl
- grpo
- arxiv:2605.14040
pipeline_tag: image-text-to-text
library_name: transformers
---

# Physics-R1 — Seed 23 (HF safetensors)

[**Project Page**](https://shanyang.me/physics-r1-page/) | [**Paper**](https://huggingface.co/papers/2605.14040) | [**Code**](https://github.com/shanyang-me/physics-r1-neurips2026) | [**Training corpus**](https://huggingface.co/datasets/shanyangmie/physr1corp)

The Physics-R1 paper checkpoint for the **seed-23 row of Table 2** (canonical step 60). Fine-tune of `Qwen3-VL-8B-Thinking` on the audited [`PhysR1Corp`](https://huggingface.co/datasets/shanyangmie/physr1corp) (2,268 closed-form physics problems) via full-parameter FSDP1 GRPO with binary correctness reward.

This seed has the **highest individual PhysOlym-A score (28.2)** among the paper's three seeds.

**This is the easy-to-use HF safetensors release.** For the original verl FSDP-sharded archive, see [`physics-r1-seed23-canonical-step60-fsdp`](https://huggingface.co/shanyangmie/physics-r1-seed23-canonical-step60-fsdp).

## Quickstart

```python
from transformers import AutoModelForImageTextToText, AutoProcessor
import torch

model = AutoModelForImageTextToText.from_pretrained(
    "shanyangmie/physics-r1-seed23",
    dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
processor = AutoProcessor.from_pretrained(
    "shanyangmie/physics-r1-seed23",
    trust_remote_code=True,
)
```

For evaluation against the paper's benchmark, see [PhysOlym-A](https://huggingface.co/datasets/shanyangmie/physolym-a) and the [code release](https://github.com/shanyang-me/physics-r1-neurips2026).

## Performance (paper Table 2, seed-23 row)

| Eval | Physics-R1 (this checkpoint) | Base Qwen3-VL-8B-Thinking | Δ |
|---|---|---|---|
| PhyX-mini | 77.9 | 73.7 | +4.2 |
| PhyX-3k | 76.6 | 74.4 | +2.2 |
| PhysReason | 43.4 | 23.9 | +19.5 |
| PUB-OE | 30.9 | 35.3 | -4.4 |
| OlympiadBench-Physics | 48.0 | 39.3 | +8.7 |
| **PhysOlym-A** | **28.2** | 8.0 | **+20.2** |

Scoring: problem-level liberal Sonnet-as-judge (every subpart of a multi-part problem must be correct). The 3-seed mean across {42, 17, 23} is the paper's headline (+18.9 pp on PhysOlym-A).

## Other seeds (HF safetensors mirrors)

| Seed | HF safetensors mirror | FSDP archive |
|---|---|---|
| 42 | [`shanyangmie/physics-r1-seed42-v4-step60`](https://huggingface.co/shanyangmie/physics-r1-seed42-v4-step60) | [`...-seed42-v4-step60-fsdp`](https://huggingface.co/shanyangmie/physics-r1-seed42-v4-step60-fsdp) |
| 17 | [`shanyangmie/physics-r1-seed17`](https://huggingface.co/shanyangmie/physics-r1-seed17) | [`...-seed17-canonical-step63-fsdp`](https://huggingface.co/shanyangmie/physics-r1-seed17-canonical-step63-fsdp) |
| 23 | **this card** | [`...-seed23-canonical-step60-fsdp`](https://huggingface.co/shanyangmie/physics-r1-seed23-canonical-step60-fsdp) |

## Training recipe

- **Base model**: [`Qwen/Qwen3-VL-8B-Thinking`](https://huggingface.co/Qwen/Qwen3-VL-8B-Thinking)
- **Algorithm**: GRPO (verl 0.6.1, full-parameter FSDP1 — `actor.strategy=fsdp`, *not* `fsdp2`)
- **Reward**: binary correctness, per-subpart Sonnet judge with problem-level AND aggregation (see paper §3.2)
- **Data**: [`shanyangmie/physr1corp`](https://huggingface.co/datasets/shanyangmie/physr1corp) — 2,268 audited closed-form problems
- **Seed / step**: 23 / 60
- **Hardware**: 4×H200 (FSDP1 4-way sharded)

Full hyperparameters in paper Appendix.

## License

Apache 2.0, inheriting from the base model [`Qwen3-VL-8B-Thinking`](https://huggingface.co/Qwen/Qwen3-VL-8B-Thinking). Training data (`physr1corp`) is CC BY-NC 4.0, so this derivative checkpoint is intended for **non-commercial research use**.

## Citation

```bibtex
@misc{yang2026physicsr1,
  title  = {Physics-R1: An Audited Olympiad Corpus and Recipe for Visual Physics Reasoning},
  author = {Yang, Shan},
  year   = {2026},
  url    = {https://huggingface.co/papers/2605.14040}
}
```