File size: 8,396 Bytes

---

license: apache-2.0
library_name: transformers
pipeline_tag: text-classification
language:
  - en
base_model: answerdotai/ModernBERT-base
tags:
  - rag
  - governance
  - hallucination-detection
  - epistemic-honesty
  - classification
  - fitz-gov
  - pyrrho
datasets:
  - yafitzdev/fitz-gov
metrics:
  - accuracy
  - f1
  - false-trustworthy-rate
---


# pyrrho-modernbert-base-v1

> Decide whether your retrieved sources support a confident answer, contradict each other, or simply don't contain it — **without an LLM call**.

This is a fine-tune of [`answerdotai/ModernBERT-base`](https://huggingface.co/answerdotai/ModernBERT-base) on [fitz-gov](https://github.com/yafitzdev/fitz-gov) V5.1 for **3-class RAG governance classification**: given a `(query, retrieved contexts)` pair, predicts one of:

| Verdict | Meaning |
|---|---|
| `ABSTAIN` | The sources do not contain enough information to answer. |
| `DISPUTED` | The sources contradict each other on the answer. |
| `TRUSTWORTHY` | The sources consistently and sufficiently support an answer. |

A drop-in replacement for the constraint+sklearn governance pipeline in [fitz-sage](https://github.com/yafitzdev/fitz-sage). Single forward pass, ~30 ms on CPU after INT8 ONNX quantization, no external LLM dependency.

---

## Results

Validated on the [fitz-gov](https://github.com/yafitzdev/fitz-gov) V5.1 eval split (584 cases, stratified 20% hold-out from `tier1_core`). All numbers are **3-seed mean ± std** across seeds [42, 1337, 7].

| Metric | pyrrho v1 | fitz-sage v0.11 (sklearn baseline) | Δ |
|---|---|---|---|
| Overall accuracy (calibrated) | **86.13 ± 0.86** | 78.7 | **+7.43** |
| False-trustworthy rate (safety) | **5.27 ± 0.21** | 5.7 | **-0.43** (safer) |
| Trustworthy recall | **79.38 ± 1.64** | 70.0 | **+9.38** |
| Disputed recall | **94.81 ± 1.28** | 86.1 | **+8.71** |
| Abstain recall | **92.94 ± 1.11** | 86.5 | **+6.44** |
| Macro F1 | 86.10 ± 0.80 | n/a | — |

---

## Known limitations

1. **Multi-source-convergence cases can be misclassified as DISPUTED.** When multiple authoritative sources state the same fact with slight numerical variation that falls within measurement tolerance (e.g., 4 climate agencies citing 1.09–1.20 °C of warming, or NIST and IUPAC both giving the speed of light), the model occasionally classifies the case as DISPUTED with high confidence. On the relevant fitz-gov subcategory (`multi_source_convergence`, n=7) the error rate is ~57%. A v2 release with augmented training data targeting this pattern is planned.

2. **Short, direct factual contexts can trigger over-abstention.** Smoke-test example: query *"When was the iPhone released?"* + a single-sentence context confirming June 29, 2007 → predicted `ABSTAIN` with P(ABSTAIN)=0.92. The model was trained on 62.7% hard tier1 cases (rich methodological contexts), so it underweights the short-clean-answer pattern. Production RAG chunks (typically 200–500 chars) are tier1-like and largely unaffected.

---

## Usage

### Direct (transformers)

```python

from transformers import AutoTokenizer, AutoModelForSequenceClassification

import torch



tokenizer = AutoTokenizer.from_pretrained("yafitzdev/pyrrho-modernbert-base-v1")

model = AutoModelForSequenceClassification.from_pretrained("yafitzdev/pyrrho-modernbert-base-v1").eval()



query = "Has the company achieved profitability?"

contexts = [

    "The company posted its first profitable quarter, with net income of $4 million.",

    "The company recorded a quarterly loss of $12 million, the third consecutive losing quarter.",

]



# Build the input the same way training data was formatted

text = f"Question: {query}\n\nSources:\n" + "\n".join(

    f"[{i}] {c}" for i, c in enumerate(contexts, start=1)

)



enc = tokenizer(text, truncation=True, max_length=4096, return_tensors="pt")

with torch.no_grad():

    logits = model(**enc).logits[0]

probs = torch.softmax(logits, dim=-1).numpy()

labels = ["ABSTAIN", "DISPUTED", "TRUSTWORTHY"]

print(f"Predicted: {labels[int(probs.argmax())]}")

print(f"Probs    : A={probs[0]:.3f} D={probs[1]:.3f} T={probs[2]:.3f}")

```

### CPU-optimized (ONNX + INT8)

For production CPU inference at ~30 ms / case, load the INT8 ONNX variant via `optimum`:

```python

from optimum.onnxruntime import ORTModelForSequenceClassification

from transformers import AutoTokenizer



tokenizer = AutoTokenizer.from_pretrained("yafitzdev/pyrrho-modernbert-base-v1")

model = ORTModelForSequenceClassification.from_pretrained(

    "yafitzdev/pyrrho-modernbert-base-v1",

    file_name="model_quantized.onnx",

)

# Same input format as above...

```

### Calibrated decision rule

The headline numbers above use **threshold calibration** on the TRUSTWORTHY softmax probability. To match the published numbers, fall back from `TRUSTWORTHY` to the runner-up class when `P(TRUSTWORTHY) < tau`. The per-seed selected `tau` varied across runs (0.34–0.62); the safest default is `tau = 0.50`.

```python

TAU = 0.50

pred = int(probs.argmax())

if pred == 2 and probs[2] < TAU:  # TRUSTWORTHY id is 2

    pred = int(probs[:2].argmax())   # fall back to runner-up between ABSTAIN/DISPUTED

```

---

## Training

| Hyperparameter | Value |
|---|---|
| Base model | `answerdotai/ModernBERT-base` |
| Architecture | ModernBERT (sequence classification head) |
| Labels (3-class) | ABSTAIN (0), DISPUTED (1), TRUSTWORTHY (2) |
| Max sequence length | 4096 tokens |
| Epochs | 5 (with early stopping, patience 2) |
| Per-device batch size | 16 |
| Effective batch size | 16 |
| Learning rate | 5e-5 |
| LR scheduler | cosine, 10% warmup |
| Weight decay | 0.01 |
| Label smoothing | 0.15 |
| Class weights | [2.3, 2.3, 1.0] (counters TRUSTWORTHY-over-prediction from 53% class imbalance) |
| Loss | Weighted cross-entropy + label smoothing |
| Selection metric | `ft_penalized_accuracy = accuracy - 3 * max(0, FT - 0.057)` |
| Optimizer | adamw_torch_fused (bf16) |
| Hardware | NVIDIA RTX 5090 (Blackwell sm_120) |

| Training time | ~80–500 s per run depending on GPU contention |



Training data: fitz-gov V5.1 `tier1_core`, stratified 80/20 split by `(label, difficulty)` for train/eval. The 60-case `tier0_sanity` set is held out separately as a noise-prone diagnostic.



---



## Dataset



This model is trained and evaluated on [**fitz-gov V5.1**](https://github.com/yafitzdev/fitz-gov), a 2,980-case benchmark for RAG governance (epistemic honesty). The eval split (584 cases) is a stratified 20% hold-out from `tier1_core` (2,920 cases, 62.7% hard difficulty, 17 domains, 113+ subcategories).

fitz-gov commit at training time: `3e1d22e22fdff726330a0d70503b07f73dacf817`

---

## Limitations & intended use

**Intended use:** as a CPU-friendly governance head inside a RAG pipeline that needs to decide when to answer, abstain, or flag a dispute. Drop-in replacement for the constraint+sklearn cascade in [fitz-sage](https://github.com/yafitzdev/fitz-sage).

**Not intended for:**
- Generating answers (this is a classification model, not a generator).
- Token-level hallucination localization (see [LettuceDetect](https://github.com/KRLabsOrg/LettuceDetect) for that — complementary use).
- Languages other than English. fitz-gov is English-only; multilingual variants are a v3+ consideration.

**Safety axis:** the false-trustworthy rate is the production safety metric (a case wrongly classified as `TRUSTWORTHY` is the dangerous error — the system would confidently surface a hallucinated or unsupported answer). Threshold calibration is tuned to keep this rate at or below the fitz-sage baseline (5.7%).

---

## Citation

```bibtex

@misc{pyrrho_v1_2026,

  title  = { pyrrho-modernbert-base-v1 },

  author = { Yan Fitzner },

  year   = { 2026 },

  url    = { https://huggingface.co/yafitzdev/pyrrho-modernbert-base-v1 },

}

```

## License

Apache 2.0 — see [LICENSE](https://github.com/yafitzdev/pyrrho/blob/main/LICENSE).

## Related projects

- [**fitz-sage**](https://github.com/yafitzdev/fitz-sage) — production RAG library that uses this model.
- [**fitz-gov**](https://github.com/yafitzdev/fitz-gov) — the benchmark dataset.
- [**pyrrho**](https://github.com/yafitzdev/pyrrho) — training code and roadmap for the full model family.