DeepSeek-Prover-V2-7B - GPTQ 4-bit

Community GPTQ int4 quantization of deepseek-ai/DeepSeek-Prover-V2-7B for Lean-4 theorem proving on consumer GPUs (tested on RTX 3080, 10 GB VRAM).

Weights: 4.6 GB (vs 14 GB FP16) - fits inference on 8-10 GB cards with room for generation context.

Quantization config

  • Bits: 4
  • Group size: 128
  • Symmetry: asymmetric (sym=False)
  • Order: true-sequential
  • Damp: 0.01
  • Calibration: pileval, 128 samples x 512 tokens
  • Tool: auto-gptq 0.7.1

Why GPTQ instead of AWQ on this model

AutoAWQ (0.2.9) crashed at layer 17 of 30 on this card with NotImplementedError: Cannot copy out of meta tensor - the per-layer scale search collides with accelerate's CPU-offload hooks on sub-VRAM quantizes. auto-gptq's per-layer lift-quantize-drop pattern avoids that path. Details in the recipe blog post (link TBD once published).

Load example

from auto_gptq import AutoGPTQForCausalLM
from transformers import AutoTokenizer

model_id = "athanor-ai/DeepSeek-Prover-V2-7B-GPTQ-4bit"  # or the fallback user namespace
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoGPTQForCausalLM.from_quantized(
    model_id, device="cuda:0", use_safetensors=True, trust_remote_code=False,
)

Observed greedy throughput on RTX 3080 (triton kernels, no compiled CUDA ext): ~3.9 tok/s at 32 new tokens.

Known caveats

  • auto-gptq 0.7.1 requires peft<0.11 - newer peft removed PEFT_TYPE_TO_MODEL_MAPPING which breaks the import.
  • rope_scaling warnings on load are cosmetic (HF transformers 4.51.3 nits), safe to ignore.
  • Capability-preservation vs FP16 is workload-dependent. Benchmark your own suite before deploying. Calibration was pileval (generic); a Lean-specific calibration corpus may improve close-rate for theorem proving.

Dependencies used to produce this upload

  • torch==2.6.0+cu124
  • transformers==4.51.3
  • auto-gptq==0.7.1
  • peft<0.11
  • accelerate==1.13.0
  • safetensors==0.7.0

License + credit

Inherits from upstream DeepSeek-Prover-V2 Model License (DeepSeek License Agreement v1.0) - see license_link in the card metadata. Original model and training: DeepSeek. This quantization: athanor-ai.

Companion benchmark

Evaluated against athanor-ai/formal-anytime-valid-stats-48 — a 48-theorem Lean-4 benchmark of anytime-valid confidence-sequence lemmas. See the dataset README for per-theorem close rates of this model alongside Claude Opus 4.6, Sonnet 4.6, Kimi K2.5, and Gemini 3 Pro.

Downloads last month
74
Safetensors
Model size
7B params
Tensor type
I32
·
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for athanor-ai/DeepSeek-Prover-V2-7B-GPTQ-4bit

Quantized
(15)
this model