shanyangmie commited on
Commit
b79272d
·
verified ·
1 Parent(s): 196b85f

Add model card (HF safetensors mirror of Physics-R1 seed-17 paper checkpoint)

Browse files
Files changed (1) hide show
  1. README.md +91 -0
README.md ADDED
@@ -0,0 +1,91 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - zh
6
+ base_model: Qwen/Qwen3-VL-8B-Thinking
7
+ tags:
8
+ - physics
9
+ - reasoning
10
+ - multimodal
11
+ - rl
12
+ - grpo
13
+ - arxiv:2605.14040
14
+ pipeline_tag: image-text-to-text
15
+ library_name: transformers
16
+ ---
17
+
18
+ # Physics-R1 — Seed 17 (HF safetensors)
19
+
20
+ [**Project Page**](https://shanyang.me/physics-r1-page/) | [**Paper**](https://huggingface.co/papers/2605.14040) | [**Code**](https://github.com/shanyang-me/physics-r1-neurips2026) | [**Training corpus**](https://huggingface.co/datasets/shanyangmie/physr1corp)
21
+
22
+ The Physics-R1 paper checkpoint for the **seed-17 row of Table 2** (canonical step 63). Fine-tune of `Qwen3-VL-8B-Thinking` on the audited [`PhysR1Corp`](https://huggingface.co/datasets/shanyangmie/physr1corp) (2,268 closed-form physics problems) via full-parameter FSDP1 GRPO with binary correctness reward.
23
+
24
+ **This is the easy-to-use HF safetensors release.** For the original verl FSDP-sharded archive, see [`physics-r1-seed17-canonical-step63-fsdp`](https://huggingface.co/shanyangmie/physics-r1-seed17-canonical-step63-fsdp).
25
+
26
+ ## Quickstart
27
+
28
+ ```python
29
+ from transformers import AutoModelForImageTextToText, AutoProcessor
30
+ import torch
31
+
32
+ model = AutoModelForImageTextToText.from_pretrained(
33
+ "shanyangmie/physics-r1-seed17",
34
+ dtype=torch.bfloat16,
35
+ device_map="auto",
36
+ trust_remote_code=True,
37
+ )
38
+ processor = AutoProcessor.from_pretrained(
39
+ "shanyangmie/physics-r1-seed17",
40
+ trust_remote_code=True,
41
+ )
42
+ ```
43
+
44
+ For evaluation against the paper's benchmark, see [PhysOlym-A](https://huggingface.co/datasets/shanyangmie/physolym-a) and the [code release](https://github.com/shanyang-me/physics-r1-neurips2026).
45
+
46
+ ## Performance (paper Table 2, seed-17 row)
47
+
48
+ | Eval | Physics-R1 (this checkpoint) | Base Qwen3-VL-8B-Thinking | Δ |
49
+ |---|---|---|---|
50
+ | PhyX-mini | 77.4 | 73.7 | +3.7 |
51
+ | PhyX-3k | 77.2 | 74.4 | +2.8 |
52
+ | PhysReason | 43.1 | 23.9 | +19.2 |
53
+ | PUB-OE | 36.4 | 35.3 | +1.1 |
54
+ | OlympiadBench-Physics | 45.3 | 39.3 | +6.0 |
55
+ | **PhysOlym-A** | **25.0** | 8.0 | **+17.0** |
56
+
57
+ Scoring: problem-level liberal Sonnet-as-judge (every subpart of a multi-part problem must be correct). The 3-seed mean across {42, 17, 23} is the paper's headline (+18.9 pp on PhysOlym-A).
58
+
59
+ ## Other seeds (HF safetensors mirrors)
60
+
61
+ | Seed | HF safetensors mirror | FSDP archive |
62
+ |---|---|---|
63
+ | 42 | [`shanyangmie/physics-r1-seed42-v4-step60`](https://huggingface.co/shanyangmie/physics-r1-seed42-v4-step60) | [`...-seed42-v4-step60-fsdp`](https://huggingface.co/shanyangmie/physics-r1-seed42-v4-step60-fsdp) |
64
+ | 17 | **this card** | [`...-seed17-canonical-step63-fsdp`](https://huggingface.co/shanyangmie/physics-r1-seed17-canonical-step63-fsdp) |
65
+ | 23 | [`shanyangmie/physics-r1-seed23`](https://huggingface.co/shanyangmie/physics-r1-seed23) | [`...-seed23-canonical-step60-fsdp`](https://huggingface.co/shanyangmie/physics-r1-seed23-canonical-step60-fsdp) |
66
+
67
+ ## Training recipe
68
+
69
+ - **Base model**: [`Qwen/Qwen3-VL-8B-Thinking`](https://huggingface.co/Qwen/Qwen3-VL-8B-Thinking)
70
+ - **Algorithm**: GRPO (verl 0.6.1, full-parameter FSDP1 — `actor.strategy=fsdp`, *not* `fsdp2`)
71
+ - **Reward**: binary correctness, per-subpart Sonnet judge with problem-level AND aggregation (see paper §3.2)
72
+ - **Data**: [`shanyangmie/physr1corp`](https://huggingface.co/datasets/shanyangmie/physr1corp) — 2,268 audited closed-form problems
73
+ - **Seed / step**: 17 / 63
74
+ - **Hardware**: 4×H200 (FSDP1 4-way sharded)
75
+
76
+ Full hyperparameters in paper Appendix.
77
+
78
+ ## License
79
+
80
+ Apache 2.0, inheriting from the base model [`Qwen3-VL-8B-Thinking`](https://huggingface.co/Qwen/Qwen3-VL-8B-Thinking). Training data (`physr1corp`) is CC BY-NC 4.0, so this derivative checkpoint is intended for **non-commercial research use**.
81
+
82
+ ## Citation
83
+
84
+ ```bibtex
85
+ @misc{yang2026physicsr1,
86
+ title = {Physics-R1: An Audited Olympiad Corpus and Recipe for Visual Physics Reasoning},
87
+ author = {Yang, Shan},
88
+ year = {2026},
89
+ url = {https://huggingface.co/papers/2605.14040}
90
+ }
91
+ ```