shanyangmie commited on
Commit
5a0423b
·
verified ·
1 Parent(s): 8e8692e

Add model card (HF safetensors mirror of Physics-R1 seed-23 paper checkpoint)

Browse files
Files changed (1) hide show
  1. README.md +93 -0
README.md ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - zh
6
+ base_model: Qwen/Qwen3-VL-8B-Thinking
7
+ tags:
8
+ - physics
9
+ - reasoning
10
+ - multimodal
11
+ - rl
12
+ - grpo
13
+ - arxiv:2605.14040
14
+ pipeline_tag: image-text-to-text
15
+ library_name: transformers
16
+ ---
17
+
18
+ # Physics-R1 — Seed 23 (HF safetensors)
19
+
20
+ [**Project Page**](https://shanyang.me/physics-r1-page/) | [**Paper**](https://huggingface.co/papers/2605.14040) | [**Code**](https://github.com/shanyang-me/physics-r1-neurips2026) | [**Training corpus**](https://huggingface.co/datasets/shanyangmie/physr1corp)
21
+
22
+ The Physics-R1 paper checkpoint for the **seed-23 row of Table 2** (canonical step 60). Fine-tune of `Qwen3-VL-8B-Thinking` on the audited [`PhysR1Corp`](https://huggingface.co/datasets/shanyangmie/physr1corp) (2,268 closed-form physics problems) via full-parameter FSDP1 GRPO with binary correctness reward.
23
+
24
+ This seed has the **highest individual PhysOlym-A score (28.2)** among the paper's three seeds.
25
+
26
+ **This is the easy-to-use HF safetensors release.** For the original verl FSDP-sharded archive, see [`physics-r1-seed23-canonical-step60-fsdp`](https://huggingface.co/shanyangmie/physics-r1-seed23-canonical-step60-fsdp).
27
+
28
+ ## Quickstart
29
+
30
+ ```python
31
+ from transformers import AutoModelForImageTextToText, AutoProcessor
32
+ import torch
33
+
34
+ model = AutoModelForImageTextToText.from_pretrained(
35
+ "shanyangmie/physics-r1-seed23",
36
+ dtype=torch.bfloat16,
37
+ device_map="auto",
38
+ trust_remote_code=True,
39
+ )
40
+ processor = AutoProcessor.from_pretrained(
41
+ "shanyangmie/physics-r1-seed23",
42
+ trust_remote_code=True,
43
+ )
44
+ ```
45
+
46
+ For evaluation against the paper's benchmark, see [PhysOlym-A](https://huggingface.co/datasets/shanyangmie/physolym-a) and the [code release](https://github.com/shanyang-me/physics-r1-neurips2026).
47
+
48
+ ## Performance (paper Table 2, seed-23 row)
49
+
50
+ | Eval | Physics-R1 (this checkpoint) | Base Qwen3-VL-8B-Thinking | Δ |
51
+ |---|---|---|---|
52
+ | PhyX-mini | 77.9 | 73.7 | +4.2 |
53
+ | PhyX-3k | 76.6 | 74.4 | +2.2 |
54
+ | PhysReason | 43.4 | 23.9 | +19.5 |
55
+ | PUB-OE | 30.9 | 35.3 | -4.4 |
56
+ | OlympiadBench-Physics | 48.0 | 39.3 | +8.7 |
57
+ | **PhysOlym-A** | **28.2** | 8.0 | **+20.2** |
58
+
59
+ Scoring: problem-level liberal Sonnet-as-judge (every subpart of a multi-part problem must be correct). The 3-seed mean across {42, 17, 23} is the paper's headline (+18.9 pp on PhysOlym-A).
60
+
61
+ ## Other seeds (HF safetensors mirrors)
62
+
63
+ | Seed | HF safetensors mirror | FSDP archive |
64
+ |---|---|---|
65
+ | 42 | [`shanyangmie/physics-r1-seed42-v4-step60`](https://huggingface.co/shanyangmie/physics-r1-seed42-v4-step60) | [`...-seed42-v4-step60-fsdp`](https://huggingface.co/shanyangmie/physics-r1-seed42-v4-step60-fsdp) |
66
+ | 17 | [`shanyangmie/physics-r1-seed17`](https://huggingface.co/shanyangmie/physics-r1-seed17) | [`...-seed17-canonical-step63-fsdp`](https://huggingface.co/shanyangmie/physics-r1-seed17-canonical-step63-fsdp) |
67
+ | 23 | **this card** | [`...-seed23-canonical-step60-fsdp`](https://huggingface.co/shanyangmie/physics-r1-seed23-canonical-step60-fsdp) |
68
+
69
+ ## Training recipe
70
+
71
+ - **Base model**: [`Qwen/Qwen3-VL-8B-Thinking`](https://huggingface.co/Qwen/Qwen3-VL-8B-Thinking)
72
+ - **Algorithm**: GRPO (verl 0.6.1, full-parameter FSDP1 — `actor.strategy=fsdp`, *not* `fsdp2`)
73
+ - **Reward**: binary correctness, per-subpart Sonnet judge with problem-level AND aggregation (see paper §3.2)
74
+ - **Data**: [`shanyangmie/physr1corp`](https://huggingface.co/datasets/shanyangmie/physr1corp) — 2,268 audited closed-form problems
75
+ - **Seed / step**: 23 / 60
76
+ - **Hardware**: 4×H200 (FSDP1 4-way sharded)
77
+
78
+ Full hyperparameters in paper Appendix.
79
+
80
+ ## License
81
+
82
+ Apache 2.0, inheriting from the base model [`Qwen3-VL-8B-Thinking`](https://huggingface.co/Qwen/Qwen3-VL-8B-Thinking). Training data (`physr1corp`) is CC BY-NC 4.0, so this derivative checkpoint is intended for **non-commercial research use**.
83
+
84
+ ## Citation
85
+
86
+ ```bibtex
87
+ @misc{yang2026physicsr1,
88
+ title = {Physics-R1: An Audited Olympiad Corpus and Recipe for Visual Physics Reasoning},
89
+ author = {Yang, Shan},
90
+ year = {2026},
91
+ url = {https://huggingface.co/papers/2605.14040}
92
+ }
93
+ ```