Instructions to use m-a-p/OProver-32B-Base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use m-a-p/OProver-32B-Base with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="m-a-p/OProver-32B-Base")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("m-a-p/OProver-32B-Base")
model = AutoModelForCausalLM.from_pretrained("m-a-p/OProver-32B-Base")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use m-a-p/OProver-32B-Base with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "m-a-p/OProver-32B-Base"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "m-a-p/OProver-32B-Base",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/m-a-p/OProver-32B-Base

SGLang

How to use m-a-p/OProver-32B-Base with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "m-a-p/OProver-32B-Base" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "m-a-p/OProver-32B-Base",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "m-a-p/OProver-32B-Base" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "m-a-p/OProver-32B-Base",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use m-a-p/OProver-32B-Base with Docker Model Runner:
```
docker model run hf.co/m-a-p/OProver-32B-Base
```

dma2077 commited on 1 day ago

Commit

0e86031

verified ·

1 Parent(s): fdcc029

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +141 -0

README.md CHANGED Viewed

@@ -1,3 +1,144 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
+language:
+  - en
+library_name: transformers
+pipeline_tag: text-generation
+tags:
+  - lean4
+  - theorem-proving
+  - formal-mathematics
+  - agentic
+  - retrieval-augmented
+datasets:
+  - m-a-p/OProofs
 ---
+# OProver
+> A unified framework for **agentic formal theorem proving** in Lean 4.
+- 📄 Paper: [OProver: A Unified Framework for Agentic Formal Theorem Proving](https://huggingface.co/papers/2605.17283) ([arXiv](https://arxiv.org/abs/2605.17283))
+- 📚 Collection: [m-a-p/OProver](https://huggingface.co/collections/m-a-p/oprover)
+- 📦 Dataset: [m-a-p/OProofs](https://huggingface.co/datasets/m-a-p/OProofs)
+OProver treats theorem proving as a multi-round refinement loop. Given a target
+theorem, the prover retrieves top-k compiler-verified proofs from a memory of
+prior proofs, generates a proof attempt, runs the Lean 4 compiler, and—on
+failure—revises the attempt using the compiler feedback in the next round.
+The same retrieval and feedback signals are baked into training, so the
+training-time interface matches the proving-time interaction.
+## Highlights
+- **State-of-the-art among open-weight whole-proof provers.** OProver-32B
+  attains the best Pass@32 on MiniF2F (93.3), ProverBench (58.2), and
+  PutnamBench (11.3), and the second-best on MathOlympiad (22.8) and
+  ProofNet (33.2). Even OProver-8B beats Goedel-Prover-V2-32B on all five
+  benchmarks despite having 4× fewer parameters.
+- **Agentic proving is in the policy, not bolted on.** Retrieval, multi-turn
+  compiler feedback, and iterative repair are all part of the trained policy.
+  Ablations show feedback is the dominant driver: removing it costs OProver-32B
+  4.9–7.4 Pass@32 points across benchmarks; removing retrieval on top adds
+  0.5–1.7 more.
+- **Co-evolving prover and corpus.** Each post-training iteration runs agentic
+  rollouts; verified proofs are added to OProofs and re-indexed for retrieval,
+  repair trajectories become SFT data, and unresolved hard cases provide RL
+  signal. MiniF2F-Test Pass@32 grows monotonically: OProver-8B 79.5 → 86.2 →
+  87.0 → 91.8 across rounds; OProver-32B 84.7 → 88.1 → 93.3.
+## What's in this release
+The OProver collection ([m-a-p/OProver](https://huggingface.co/collections/m-a-p/oprover)) bundles the paper,
+the corpus, and seven model checkpoints covering both training stages and
+both model sizes:
+| Repo | Stage | Size | Note |
+|---|---|---|---|
+| [OProver-8B-Base](https://huggingface.co/m-a-p/OProver-8B-Base) | Continued pretraining only | 8B | Domain-adapted base; before SFT/RL |
+| [OProver-32B-Base](https://huggingface.co/m-a-p/OProver-32B-Base) | Continued pretraining only | 32B | Domain-adapted base; before SFT/RL |
+| [OProver-8B-Round1](https://huggingface.co/m-a-p/OProver-8B-Round1) | Post-training Round 1 | 8B | After first SFT+RL iteration |
+| [OProver-8B-Round2](https://huggingface.co/m-a-p/OProver-8B-Round2) | Post-training Round 2 | 8B | After second SFT+RL iteration |
+| [OProver-32B-Round1](https://huggingface.co/m-a-p/OProver-32B-Round1) | Post-training Round 1 | 32B | After first SFT+RL iteration |
+| **[OProver-8B](https://huggingface.co/m-a-p/OProver-8B)** | Final | 8B | The 8B prover reported in the paper (Round 3) |
+| **[OProver-32B](https://huggingface.co/m-a-p/OProver-32B)** | Final | 32B | The 32B prover reported in the paper (Round 2) |
+| [OProofs](https://huggingface.co/datasets/m-a-p/OProofs) | Dataset | 6.86M proofs | Lean 4 corpus used for CPT, SFT, and the retrieval memory |
+Use **OProver-8B** or **OProver-32B** for proving. The Base / Round-N
+checkpoints are released for reproducibility and ablation studies.
+## Dataset: OProofs
+OProofs is a large-scale Lean 4 corpus that doubles as the retrieval memory at
+proving time. It is built from three sources: public Lean resources
+(NuminaMath-LEAN, Lean-Workbook, Leanabell-FormalStmt, Goedel-Pset, …),
+large-scale autoformalization + agentic proof synthesis from informal math
+mined on Common Crawl and GitHub, and traces from OProver's own agentic
+proving runs.
+![OProofs Lean 4 corpus statistics](./assets/oproofs.png)
+## Training
+Two stages, both anchored on OProofs.
+**1. Continued pretraining (one-time).** A 65B-token mixture of formal
+Lean (≈30%, from OProofs), code (≈20%, OpenCoder), mathematics (≈40%,
+Nemotron-Math-4-Plus), and long-CoT (≈10%, ProLong-64K). AdamW, peak LR
+5e-5, cosine with 3% warmup, batch 512, sequence length 8192. Output:
+`OProver-{8B,32B}-Base`.
+**2. Iterative post-training.** Each round runs:
+1. Agentic proving with the current prover on a theorem pool, producing
+   multi-round rollouts conditioned on retrieved proofs and Lean feedback.
+2. **SFT** on round-level repair examples `(s, R, p_{t-1}, f_{t-1}) → p_t`,
+   with cross-entropy loss only on the new attempt.
+3. **GSPO RL** on hard cases (groups with non-trivial pass-rate). Per-round
+   reward `r = 0.8 + 0.2·1[format ok]` if Lean-verified, else 0; advantages
+   are pooled across the n×R rounds for the same theorem.
+4. Newly verified proofs and repair trajectories are folded back into OProofs
+   and re-indexed into the retrieval memory for the next round.
+## Results
+Pass@32 (n=64) across five Lean 4 benchmarks. **Bold** is best, _underlined_
+is second-best.
+![Main results: Pass@32 across five Lean 4 benchmarks](./assets/main_results.png)
+OProver-32B reaches three best and two second-best across five benchmarks —
+the most top placements of any model in the comparison, despite being a
+32B dense model versus a 560B MoE or a 671B dense competitor.
+## Loading
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+# Pick any of: OProver-8B, OProver-32B, *-Base, *-Round1, *-Round2
+name = "m-a-p/OProver-8B"
+tok = AutoTokenizer.from_pretrained(name)
+model = AutoModelForCausalLM.from_pretrained(name, torch_dtype="bfloat16", device_map="auto")
+```
+```python
+from datasets import load_dataset
+ds = load_dataset("m-a-p/OProofs", split="train")
+```
+OProver is trained against a multi-round agentic interface: at each round the
+input includes the target Lean statement, top-k retrieved verified proofs, the
+prior proof attempt, and the Lean compiler feedback. See the paper §2.1 for
+the prompt template, and Appendix B for serialization details.
+## Citation
+```bibtex
+@article{ma2026oprover,
+  title   = {OProver: A Unified Framework for Agentic Formal Theorem Proving},
+  author  = {David Ma and Kaijing Ma and Shawn Guo and Yunfeng Shi and Enduo Zhao and Jiajun Shi and Zhaoxiang Zhang and Gavin Cheung and Jiaheng Liu and Zili Wang},
+  journal = {arXiv preprint arXiv:2605.17283},
+  year    = {2026}
+}
+```