Physics-R1 β€” Seed 17, v4 step-60 (FSDP-sharded)

Project Page | Paper | Code | Training corpus

Physics-R1 fine-tune of Qwen3-VL-8B-Thinking on the audited PhysR1Corp (2,268 closed-form physics problems) via full-parameter FSDP1 GRPO with binary correctness reward. This is the seed-17 v4 (audited-data) re-validation checkpoint at step 60.

Released alongside Physics-R1: An Audited Olympiad Corpus and Recipe for Visual Physics Reasoning.

Which checkpoint should you use?

Checkpoint Use for Notes
physics-r1-seed17-canonical-step63-fsdp Exact reproduction of paper Table 2 seed-17 row Canonical paper checkpoint (step 63)
physics-r1-seed17-v4-step60-fsdp Re-validation on audited corpus This card β€” v4 audited re-run, tracks canonical closely
physics-r1-seed17-v4-step50-fsdp Step ablation Same run, earlier step
physics-r1-seed17-v4-step40-fsdp Step ablation Same run, earlier step
physics-r1-seed42-v4-step60-fsdp Paper Table 2 seed-42 row Step-60 binary, seed 42
physics-r1-seed23-canonical-step60-fsdp Paper Table 2 seed-23 row Canonical step-60, seed 23

On the relationship to Table 2: the paper's seed-17 row (PhysReason 43.1, PhysOlym-A 25.0, PhyX-3k 77.2, ...) is from the canonical step-63 checkpoint. This v4 step-60 checkpoint is a re-validation on the audited 2,268-record PhysR1Corp; its step-60 mean tracks the canonical mean within statistical noise. For exact paper-reproduction numbers, use the canonical checkpoint.

Training recipe

  • Base model: Qwen/Qwen3-VL-8B-Thinking
  • Algorithm: GRPO (verl 0.6.1, full-parameter FSDP1 β€” actor.strategy=fsdp, not fsdp2; FSDP2 fails on Qwen3-VL visual encoder device placement)
  • Reward: binary correctness, per-subpart Sonnet judge with problem-level AND aggregation (see paper Β§3.2)
  • Data: shanyangmie/physr1corp β€” 2,268 audited closed-form problems
  • Seed / step: 17 / 60
  • Hardware: 4Γ—H200 (FSDP1 4-way sharded)

Full hyperparameters are in the paper appendix.

Format: verl FSDP-sharded checkpoint (conversion required)

This checkpoint is saved in verl's FSDP-sharded format, not safetensors. It is not directly loadable via AutoModelForImageTextToText.from_pretrained without a merge step.

File layout

actor/
β”œβ”€β”€ huggingface/                              # HF-style config + tokenizer
β”‚   β”œβ”€β”€ config.json
β”‚   β”œβ”€β”€ tokenizer.json, merges.txt, vocab.json
β”‚   β”œβ”€β”€ preprocessor_config.json
β”‚   └── ...
β”œβ”€β”€ model_world_size_4_rank_{0,1,2,3}.pt      # 4-way FSDP weight shards (~8.7 GB each, ~35 GB total)
β”œβ”€β”€ optim_world_size_4_rank_{0,1,2,3}.pt      # optimizer state (~17.5 GB each, not needed for inference)
β”œβ”€β”€ extra_state_world_size_4_rank_{0..3}.pt
└── fsdp_config.json
data.pt                                       # verl bookkeeping (not needed for inference)

Convert to HF safetensors

Use verl's model_merger.py:

git clone https://github.com/volcengine/verl
cd verl

# Download only the inference-required files (skips ~70 GB of optimizer state)
huggingface-cli download shanyangmie/physics-r1-seed17-v4-step60-fsdp \
    --include "actor/model_world_size_4_rank_*.pt" \
    --include "actor/huggingface/*" \
    --include "actor/fsdp_config.json" \
    --include "actor/extra_state_world_size_4_rank_*.pt" \
    --local-dir ./ckpt

# Merge FSDP shards into HF safetensors
python scripts/model_merger.py merge \
    --backend fsdp \
    --hf_model_path Qwen/Qwen3-VL-8B-Thinking \
    --local_dir ./ckpt/actor \
    --target_dir ./physics-r1-seed17-v4-step60-hf

Then load with standard HF:

from transformers import AutoModelForImageTextToText, AutoProcessor

model = AutoModelForImageTextToText.from_pretrained(
    "./physics-r1-seed17-v4-step60-hf",
    torch_dtype="bfloat16",
    device_map="auto",
)
processor = AutoProcessor.from_pretrained("./physics-r1-seed17-v4-step60-hf")

License

Apache 2.0, inheriting from the base model Qwen3-VL-8B-Thinking. Training data (physr1corp) is CC BY-NC 4.0, so this derivative checkpoint is intended for non-commercial research use.

Citation

@misc{yang2026physicsr1,
  title  = {Physics-R1: An Audited Olympiad Corpus and Recipe for Visual Physics Reasoning},
  author = {Yang, Shan},
  year   = {2026},
  url    = {https://huggingface.co/papers/2605.14040}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for shanyangmie/physics-r1-seed17-v4-step60-fsdp

Finetuned
(60)
this model

Paper for shanyangmie/physics-r1-seed17-v4-step60-fsdp