--- base_model: Qwen/Qwen3.5-2B pipeline_tag: text-generation tags: - base_model:adapter:Qwen/Qwen3.5-2B - lora - transformers - lean4 - theorem-proving - tactic-prediction - formal-verification - mathlib - peft license: mit datasets: - markm39/openproof-tactic-pairs --- # OpenProof Tactic 2B A Lean 4 tactic prediction model fine-tuned from [Qwen/Qwen3.5-2B](https://huggingface.co/Qwen/Qwen3.5-2B) on 190K+ Mathlib tactic pairs. Given a Lean 4 proof goal state, the model predicts the next tactic to apply. Designed for use in best-first search (BFS) proof engines like [OpenProof](https://github.com/mxthematic/openproof). ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel import torch base = "Qwen/Qwen3.5-2B" tokenizer = AutoTokenizer.from_pretrained(base) model = AutoModelForCausalLM.from_pretrained(base, torch_dtype=torch.bfloat16, device_map="cuda") model = PeftModel.from_pretrained(model, "markm39/openproof-tactic-2b") # Input: goal state followed by ::: prompt = "n m : Nat\\n⊢ n + m = m + n:::" inputs = tokenizer(prompt, return_tensors="pt").to("cuda") with torch.no_grad(): out = model.generate(**inputs, max_new_tokens=64, do_sample=True, temperature=0.6, top_k=10) print(tokenizer.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)) # Output: induction m with | zero => simp | succ m ih => simp [Nat.add_assoc, ih, Nat.add_comm] ``` ## Training - **Base model**: Qwen/Qwen3.5-2B - **Method**: SFT with LoRA (rank 64, alpha 128) - **Trainable params**: 43.6M / 1.93B (2.3%) - **Data**: [markm39/openproof-tactic-pairs](https://huggingface.co/datasets/markm39/openproof-tactic-pairs) -- 190K+ Lean 4 goal-tactic pairs extracted from Mathlib - **Input format**: `{goal_state}:::` - **Output format**: tactic string - **Epochs**: ~1.8 (12,500 steps) - **Batch size**: effective 128 (2 per device x 32 gradient accumulation) - **Learning rate**: 2e-5 cosine schedule - **Best eval loss**: 0.5492 (checkpoint-12500) - **Hardware**: RunPod GPU instance - **Framework**: HuggingFace TRL + PEFT ## Intended Use Tactic prediction for automated theorem proving in Lean 4. The model generates candidate tactics given a proof goal state, which are then verified by the Lean type checker. Designed to be used with beam search -- sample multiple candidates and verify each. Not intended for general-purpose text generation. ## Limitations - Trained on Mathlib tactics only -- may not generalize to custom Lean libraries - LoRA adapter (must be loaded on top of Qwen3.5-2B base) - SFT only (no RL/GRPO refinement yet) - May hallucinate lemma names that don't exist in Mathlib ## Links - [OpenProof](https://github.com/mxthematic/openproof) -- the theorem prover that uses this model - [Training code](https://github.com/mxthematic/openproof-ml) - [Training data](https://huggingface.co/datasets/markm39/openproof-tactic-pairs) ## Acknowledgments Training data derived from [LeanDojo](https://leandojo.org/) (Yang et al.), [Lean Workbook](https://huggingface.co/datasets/internlm/Lean-Workbook) (InternLM), and [Goedel-Pset](https://huggingface.co/datasets/Goedel-LM/Goedel-Pset) (Goedel-LM). Tactic extraction from whole proofs uses [Pantograph](https://github.com/leanprover/Pantograph) kernel-level invocations.