--- license: apache-2.0 library_name: transformers pipeline_tag: text-classification language: - en base_model: answerdotai/ModernBERT-base tags: - rag - governance - hallucination-detection - epistemic-honesty - classification - fitz-gov - pyrrho datasets: - yafitzdev/fitz-gov metrics: - accuracy - f1 - false-trustworthy-rate --- # pyrrho-modernbert-base-v1 > Decide whether your retrieved sources support a confident answer, contradict each other, or simply don't contain it — **without an LLM call**. This is a fine-tune of [`answerdotai/ModernBERT-base`](https://huggingface.co/answerdotai/ModernBERT-base) on [fitz-gov](https://github.com/yafitzdev/fitz-gov) V5.1 for **3-class RAG governance classification**: given a `(query, retrieved contexts)` pair, predicts one of: | Verdict | Meaning | |---|---| | `ABSTAIN` | The sources do not contain enough information to answer. | | `DISPUTED` | The sources contradict each other on the answer. | | `TRUSTWORTHY` | The sources consistently and sufficiently support an answer. | A drop-in replacement for the constraint+sklearn governance pipeline in [fitz-sage](https://github.com/yafitzdev/fitz-sage). Single forward pass, ~30 ms on CPU after INT8 ONNX quantization, no external LLM dependency. --- ## Results Validated on the [fitz-gov](https://github.com/yafitzdev/fitz-gov) V5.1 eval split (584 cases, stratified 20% hold-out from `tier1_core`). All numbers are **3-seed mean ± std** across seeds [42, 1337, 7]. | Metric | pyrrho v1 | fitz-sage v0.11 (sklearn baseline) | Δ | |---|---|---|---| | Overall accuracy (calibrated) | **86.13 ± 0.86** | 78.7 | **+7.43** | | False-trustworthy rate (safety) | **5.27 ± 0.21** | 5.7 | **-0.43** (safer) | | Trustworthy recall | **79.38 ± 1.64** | 70.0 | **+9.38** | | Disputed recall | **94.81 ± 1.28** | 86.1 | **+8.71** | | Abstain recall | **92.94 ± 1.11** | 86.5 | **+6.44** | | Macro F1 | 86.10 ± 0.80 | n/a | — | --- ## Known limitations 1. **Multi-source-convergence cases can be misclassified as DISPUTED.** When multiple authoritative sources state the same fact with slight numerical variation that falls within measurement tolerance (e.g., 4 climate agencies citing 1.09–1.20 °C of warming, or NIST and IUPAC both giving the speed of light), the model occasionally classifies the case as DISPUTED with high confidence. On the relevant fitz-gov subcategory (`multi_source_convergence`, n=7) the error rate is ~57%. A v2 release with augmented training data targeting this pattern is planned. 2. **Short, direct factual contexts can trigger over-abstention.** Smoke-test example: query *"When was the iPhone released?"* + a single-sentence context confirming June 29, 2007 → predicted `ABSTAIN` with P(ABSTAIN)=0.92. The model was trained on 62.7% hard tier1 cases (rich methodological contexts), so it underweights the short-clean-answer pattern. Production RAG chunks (typically 200–500 chars) are tier1-like and largely unaffected. --- ## Usage ### Direct (transformers) ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch tokenizer = AutoTokenizer.from_pretrained("yafitzdev/pyrrho-modernbert-base-v1") model = AutoModelForSequenceClassification.from_pretrained("yafitzdev/pyrrho-modernbert-base-v1").eval() query = "Has the company achieved profitability?" contexts = [ "The company posted its first profitable quarter, with net income of $4 million.", "The company recorded a quarterly loss of $12 million, the third consecutive losing quarter.", ] # Build the input the same way training data was formatted text = f"Question: {query}\n\nSources:\n" + "\n".join( f"[{i}] {c}" for i, c in enumerate(contexts, start=1) ) enc = tokenizer(text, truncation=True, max_length=4096, return_tensors="pt") with torch.no_grad(): logits = model(**enc).logits[0] probs = torch.softmax(logits, dim=-1).numpy() labels = ["ABSTAIN", "DISPUTED", "TRUSTWORTHY"] print(f"Predicted: {labels[int(probs.argmax())]}") print(f"Probs : A={probs[0]:.3f} D={probs[1]:.3f} T={probs[2]:.3f}") ``` ### CPU-optimized (ONNX + INT8) For production CPU inference at ~30 ms / case, load the INT8 ONNX variant via `optimum`: ```python from optimum.onnxruntime import ORTModelForSequenceClassification from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("yafitzdev/pyrrho-modernbert-base-v1") model = ORTModelForSequenceClassification.from_pretrained( "yafitzdev/pyrrho-modernbert-base-v1", file_name="model_quantized.onnx", ) # Same input format as above... ``` ### Calibrated decision rule The headline numbers above use **threshold calibration** on the TRUSTWORTHY softmax probability. To match the published numbers, fall back from `TRUSTWORTHY` to the runner-up class when `P(TRUSTWORTHY) < tau`. The per-seed selected `tau` varied across runs (0.34–0.62); the safest default is `tau = 0.50`. ```python TAU = 0.50 pred = int(probs.argmax()) if pred == 2 and probs[2] < TAU: # TRUSTWORTHY id is 2 pred = int(probs[:2].argmax()) # fall back to runner-up between ABSTAIN/DISPUTED ``` --- ## Training | Hyperparameter | Value | |---|---| | Base model | `answerdotai/ModernBERT-base` | | Architecture | ModernBERT (sequence classification head) | | Labels (3-class) | ABSTAIN (0), DISPUTED (1), TRUSTWORTHY (2) | | Max sequence length | 4096 tokens | | Epochs | 5 (with early stopping, patience 2) | | Per-device batch size | 16 | | Effective batch size | 16 | | Learning rate | 5e-5 | | LR scheduler | cosine, 10% warmup | | Weight decay | 0.01 | | Label smoothing | 0.15 | | Class weights | [2.3, 2.3, 1.0] (counters TRUSTWORTHY-over-prediction from 53% class imbalance) | | Loss | Weighted cross-entropy + label smoothing | | Selection metric | `ft_penalized_accuracy = accuracy - 3 * max(0, FT - 0.057)` | | Optimizer | adamw_torch_fused (bf16) | | Hardware | NVIDIA RTX 5090 (Blackwell sm_120) | | Training time | ~80–500 s per run depending on GPU contention | Training data: fitz-gov V5.1 `tier1_core`, stratified 80/20 split by `(label, difficulty)` for train/eval. The 60-case `tier0_sanity` set is held out separately as a noise-prone diagnostic. --- ## Dataset This model is trained and evaluated on [**fitz-gov V5.1**](https://github.com/yafitzdev/fitz-gov), a 2,980-case benchmark for RAG governance (epistemic honesty). The eval split (584 cases) is a stratified 20% hold-out from `tier1_core` (2,920 cases, 62.7% hard difficulty, 17 domains, 113+ subcategories). fitz-gov commit at training time: `3e1d22e22fdff726330a0d70503b07f73dacf817` --- ## Limitations & intended use **Intended use:** as a CPU-friendly governance head inside a RAG pipeline that needs to decide when to answer, abstain, or flag a dispute. Drop-in replacement for the constraint+sklearn cascade in [fitz-sage](https://github.com/yafitzdev/fitz-sage). **Not intended for:** - Generating answers (this is a classification model, not a generator). - Token-level hallucination localization (see [LettuceDetect](https://github.com/KRLabsOrg/LettuceDetect) for that — complementary use). - Languages other than English. fitz-gov is English-only; multilingual variants are a v3+ consideration. **Safety axis:** the false-trustworthy rate is the production safety metric (a case wrongly classified as `TRUSTWORTHY` is the dangerous error — the system would confidently surface a hallucinated or unsupported answer). Threshold calibration is tuned to keep this rate at or below the fitz-sage baseline (5.7%). --- ## Citation ```bibtex @misc{pyrrho_v1_2026, title = { pyrrho-modernbert-base-v1 }, author = { Yan Fitzner }, year = { 2026 }, url = { https://huggingface.co/yafitzdev/pyrrho-modernbert-base-v1 }, } ``` ## License Apache 2.0 — see [LICENSE](https://github.com/yafitzdev/pyrrho/blob/main/LICENSE). ## Related projects - [**fitz-sage**](https://github.com/yafitzdev/fitz-sage) — production RAG library that uses this model. - [**fitz-gov**](https://github.com/yafitzdev/fitz-gov) — the benchmark dataset. - [**pyrrho**](https://github.com/yafitzdev/pyrrho) — training code and roadmap for the full model family.