How to use from
vLLM
Install from pip and serve model
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "m-a-p/OProver-32B-Base"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "m-a-p/OProver-32B-Base",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'
Use Docker
docker model run hf.co/m-a-p/OProver-32B-Base
Quick Links

OProver

A unified framework for agentic formal theorem proving in Lean 4.

OProver treats theorem proving as a multi-round refinement loop. Given a target theorem, the prover retrieves top-k compiler-verified proofs from a memory of prior proofs, generates a proof attempt, runs the Lean 4 compiler, and—on failure—revises the attempt using the compiler feedback in the next round. The same retrieval and feedback signals are baked into training, so the training-time interface matches the proving-time interaction.

Highlights

  • State-of-the-art among open-weight whole-proof provers. OProver-32B attains the best Pass@32 on MiniF2F (93.3), ProverBench (58.2), and PutnamBench (11.3), and the second-best on MathOlympiad (22.8) and ProofNet (33.2). Even OProver-8B beats Goedel-Prover-V2-32B on all five benchmarks despite having 4× fewer parameters.
  • Agentic proving is in the policy, not bolted on. Retrieval, multi-turn compiler feedback, and iterative repair are all part of the trained policy. Ablations show feedback is the dominant driver: removing it costs OProver-32B 4.9–7.4 Pass@32 points across benchmarks; removing retrieval on top adds 0.5–1.7 more.
  • Co-evolving prover and corpus. Each post-training iteration runs agentic rollouts; verified proofs are added to OProofs and re-indexed for retrieval, repair trajectories become SFT data, and unresolved hard cases provide RL signal. MiniF2F-Test Pass@32 grows monotonically: OProver-8B 79.5 → 86.2 → 87.0 → 91.8 across rounds; OProver-32B 84.7 → 88.1 → 93.3.

What's in this release

The OProver collection (m-a-p/OProver) bundles the paper, the corpus, and seven model checkpoints covering both training stages and both model sizes:

Repo Stage Size Note
OProver-8B-Base Continued pretraining only 8B Domain-adapted base; before SFT/RL
OProver-32B-Base Continued pretraining only 32B Domain-adapted base; before SFT/RL
OProver-8B-Round1 Post-training Round 1 8B After first SFT+RL iteration
OProver-8B-Round2 Post-training Round 2 8B After second SFT+RL iteration
OProver-32B-Round1 Post-training Round 1 32B After first SFT+RL iteration
OProver-8B Final 8B The 8B prover reported in the paper (Round 3)
OProver-32B Final 32B The 32B prover reported in the paper (Round 2)
OProofs Dataset 6.86M proofs Lean 4 corpus used for CPT, SFT, and the retrieval memory

Use OProver-8B or OProver-32B for proving. The Base / Round-N checkpoints are released for reproducibility and ablation studies.

Dataset: OProofs

OProofs is a large-scale Lean 4 corpus that doubles as the retrieval memory at proving time. It is built from three sources: public Lean resources (NuminaMath-LEAN, Lean-Workbook, Leanabell-FormalStmt, Goedel-Pset, …), large-scale autoformalization + agentic proof synthesis from informal math mined on Common Crawl and GitHub, and traces from OProver's own agentic proving runs.

OProofs Lean 4 corpus statistics

Training

Two stages, both anchored on OProofs.

1. Continued pretraining (one-time). A 65B-token mixture of formal Lean (≈30%, from OProofs), code (≈20%, OpenCoder), mathematics (≈40%, Nemotron-Math-4-Plus), and long-CoT (≈10%, ProLong-64K). AdamW, peak LR 5e-5, cosine with 3% warmup, batch 512, sequence length 8192. Output: OProver-{8B,32B}-Base.

2. Iterative post-training. Each round runs:

  1. Agentic proving with the current prover on a theorem pool, producing multi-round rollouts conditioned on retrieved proofs and Lean feedback.
  2. SFT on round-level repair examples (s, R, p_{t-1}, f_{t-1}) → p_t, with cross-entropy loss only on the new attempt.
  3. GSPO RL on hard cases (groups with non-trivial pass-rate). Per-round reward r = 0.8 + 0.2·1[format ok] if Lean-verified, else 0; advantages are pooled across the n×R rounds for the same theorem.
  4. Newly verified proofs and repair trajectories are folded back into OProofs and re-indexed into the retrieval memory for the next round.

Results

Pass@32 (n=64) across five Lean 4 benchmarks. Bold is best, underlined is second-best.

Main results: Pass@32 across five Lean 4 benchmarks

OProver-32B reaches three best and two second-best across five benchmarks — the most top placements of any model in the comparison, despite being a 32B dense model versus a 560B MoE or a 671B dense competitor.

Loading

from transformers import AutoModelForCausalLM, AutoTokenizer

# Pick any of: OProver-8B, OProver-32B, *-Base, *-Round1, *-Round2
name = "m-a-p/OProver-8B"
tok = AutoTokenizer.from_pretrained(name)
model = AutoModelForCausalLM.from_pretrained(name, torch_dtype="bfloat16", device_map="auto")
from datasets import load_dataset
ds = load_dataset("m-a-p/OProofs", split="train")

OProver is trained against a multi-round agentic interface: at each round the input includes the target Lean statement, top-k retrieved verified proofs, the prior proof attempt, and the Lean compiler feedback. See the paper §2.1 for the prompt template, and Appendix B for serialization details.

Citation

@article{ma2026oprover,
  title   = {OProver: A Unified Framework for Agentic Formal Theorem Proving},
  author  = {David Ma and Kaijing Ma and Shawn Guo and Yunfeng Shi and Enduo Zhao and Jiajun Shi and Zhaoxiang Zhang and Gavin Cheung and Jiaheng Liu and Zili Wang},
  journal = {arXiv preprint arXiv:2605.17283},
  year    = {2026}
}
Downloads last month
22
Safetensors
Model size
33B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for m-a-p/OProver-32B-Base

Quantizations
1 model

Dataset used to train m-a-p/OProver-32B-Base

Collection including m-a-p/OProver-32B-Base

Paper for m-a-p/OProver-32B-Base