File size: 8,396 Bytes
eeea657
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
54ab7af
eeea657
 
 
 
 
 
 
 
 
 
54ab7af
eeea657
 
 
 
 
 
 
 
 
 
 
 
 
54ab7af
eeea657
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
54ab7af
eeea657
 
 
 
 
74ebc6f
eeea657
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
---

license: apache-2.0
library_name: transformers
pipeline_tag: text-classification
language:
  - en
base_model: answerdotai/ModernBERT-base
tags:
  - rag
  - governance
  - hallucination-detection
  - epistemic-honesty
  - classification
  - fitz-gov
  - pyrrho
datasets:
  - yafitzdev/fitz-gov
metrics:
  - accuracy
  - f1
  - false-trustworthy-rate
---


# pyrrho-modernbert-base-v1

> Decide whether your retrieved sources support a confident answer, contradict each other, or simply don't contain it — **without an LLM call**.

This is a fine-tune of [`answerdotai/ModernBERT-base`](https://huggingface.co/answerdotai/ModernBERT-base) on [fitz-gov](https://github.com/yafitzdev/fitz-gov) V5.1 for **3-class RAG governance classification**: given a `(query, retrieved contexts)` pair, predicts one of:

| Verdict | Meaning |
|---|---|
| `ABSTAIN` | The sources do not contain enough information to answer. |
| `DISPUTED` | The sources contradict each other on the answer. |
| `TRUSTWORTHY` | The sources consistently and sufficiently support an answer. |

A drop-in replacement for the constraint+sklearn governance pipeline in [fitz-sage](https://github.com/yafitzdev/fitz-sage). Single forward pass, ~30 ms on CPU after INT8 ONNX quantization, no external LLM dependency.

---

## Results

Validated on the [fitz-gov](https://github.com/yafitzdev/fitz-gov) V5.1 eval split (584 cases, stratified 20% hold-out from `tier1_core`). All numbers are **3-seed mean ± std** across seeds [42, 1337, 7].

| Metric | pyrrho v1 | fitz-sage v0.11 (sklearn baseline) | Δ |
|---|---|---|---|
| Overall accuracy (calibrated) | **86.13 ± 0.86** | 78.7 | **+7.43** |
| False-trustworthy rate (safety) | **5.27 ± 0.21** | 5.7 | **-0.43** (safer) |
| Trustworthy recall | **79.38 ± 1.64** | 70.0 | **+9.38** |
| Disputed recall | **94.81 ± 1.28** | 86.1 | **+8.71** |
| Abstain recall | **92.94 ± 1.11** | 86.5 | **+6.44** |
| Macro F1 | 86.10 ± 0.80 | n/a | — |

---

## Known limitations

1. **Multi-source-convergence cases can be misclassified as DISPUTED.** When multiple authoritative sources state the same fact with slight numerical variation that falls within measurement tolerance (e.g., 4 climate agencies citing 1.09–1.20 °C of warming, or NIST and IUPAC both giving the speed of light), the model occasionally classifies the case as DISPUTED with high confidence. On the relevant fitz-gov subcategory (`multi_source_convergence`, n=7) the error rate is ~57%. A v2 release with augmented training data targeting this pattern is planned.

2. **Short, direct factual contexts can trigger over-abstention.** Smoke-test example: query *"When was the iPhone released?"* + a single-sentence context confirming June 29, 2007 → predicted `ABSTAIN` with P(ABSTAIN)=0.92. The model was trained on 62.7% hard tier1 cases (rich methodological contexts), so it underweights the short-clean-answer pattern. Production RAG chunks (typically 200–500 chars) are tier1-like and largely unaffected.

---

## Usage

### Direct (transformers)

```python

from transformers import AutoTokenizer, AutoModelForSequenceClassification

import torch



tokenizer = AutoTokenizer.from_pretrained("yafitzdev/pyrrho-modernbert-base-v1")

model = AutoModelForSequenceClassification.from_pretrained("yafitzdev/pyrrho-modernbert-base-v1").eval()



query = "Has the company achieved profitability?"

contexts = [

    "The company posted its first profitable quarter, with net income of $4 million.",

    "The company recorded a quarterly loss of $12 million, the third consecutive losing quarter.",

]



# Build the input the same way training data was formatted

text = f"Question: {query}\n\nSources:\n" + "\n".join(

    f"[{i}] {c}" for i, c in enumerate(contexts, start=1)

)



enc = tokenizer(text, truncation=True, max_length=4096, return_tensors="pt")

with torch.no_grad():

    logits = model(**enc).logits[0]

probs = torch.softmax(logits, dim=-1).numpy()

labels = ["ABSTAIN", "DISPUTED", "TRUSTWORTHY"]

print(f"Predicted: {labels[int(probs.argmax())]}")

print(f"Probs    : A={probs[0]:.3f} D={probs[1]:.3f} T={probs[2]:.3f}")

```

### CPU-optimized (ONNX + INT8)

For production CPU inference at ~30 ms / case, load the INT8 ONNX variant via `optimum`:

```python

from optimum.onnxruntime import ORTModelForSequenceClassification

from transformers import AutoTokenizer



tokenizer = AutoTokenizer.from_pretrained("yafitzdev/pyrrho-modernbert-base-v1")

model = ORTModelForSequenceClassification.from_pretrained(

    "yafitzdev/pyrrho-modernbert-base-v1",

    file_name="model_quantized.onnx",

)

# Same input format as above...

```

### Calibrated decision rule

The headline numbers above use **threshold calibration** on the TRUSTWORTHY softmax probability. To match the published numbers, fall back from `TRUSTWORTHY` to the runner-up class when `P(TRUSTWORTHY) < tau`. The per-seed selected `tau` varied across runs (0.34–0.62); the safest default is `tau = 0.50`.

```python

TAU = 0.50

pred = int(probs.argmax())

if pred == 2 and probs[2] < TAU:  # TRUSTWORTHY id is 2

    pred = int(probs[:2].argmax())   # fall back to runner-up between ABSTAIN/DISPUTED

```

---

## Training

| Hyperparameter | Value |
|---|---|
| Base model | `answerdotai/ModernBERT-base` |
| Architecture | ModernBERT (sequence classification head) |
| Labels (3-class) | ABSTAIN (0), DISPUTED (1), TRUSTWORTHY (2) |
| Max sequence length | 4096 tokens |
| Epochs | 5 (with early stopping, patience 2) |
| Per-device batch size | 16 |
| Effective batch size | 16 |
| Learning rate | 5e-5 |
| LR scheduler | cosine, 10% warmup |
| Weight decay | 0.01 |
| Label smoothing | 0.15 |
| Class weights | [2.3, 2.3, 1.0] (counters TRUSTWORTHY-over-prediction from 53% class imbalance) |
| Loss | Weighted cross-entropy + label smoothing |
| Selection metric | `ft_penalized_accuracy = accuracy - 3 * max(0, FT - 0.057)` |
| Optimizer | adamw_torch_fused (bf16) |
| Hardware | NVIDIA RTX 5090 (Blackwell sm_120) |

| Training time | ~80–500 s per run depending on GPU contention |



Training data: fitz-gov V5.1 `tier1_core`, stratified 80/20 split by `(label, difficulty)` for train/eval. The 60-case `tier0_sanity` set is held out separately as a noise-prone diagnostic.



---



## Dataset



This model is trained and evaluated on [**fitz-gov V5.1**](https://github.com/yafitzdev/fitz-gov), a 2,980-case benchmark for RAG governance (epistemic honesty). The eval split (584 cases) is a stratified 20% hold-out from `tier1_core` (2,920 cases, 62.7% hard difficulty, 17 domains, 113+ subcategories).

fitz-gov commit at training time: `3e1d22e22fdff726330a0d70503b07f73dacf817`

---

## Limitations & intended use

**Intended use:** as a CPU-friendly governance head inside a RAG pipeline that needs to decide when to answer, abstain, or flag a dispute. Drop-in replacement for the constraint+sklearn cascade in [fitz-sage](https://github.com/yafitzdev/fitz-sage).

**Not intended for:**
- Generating answers (this is a classification model, not a generator).
- Token-level hallucination localization (see [LettuceDetect](https://github.com/KRLabsOrg/LettuceDetect) for that — complementary use).
- Languages other than English. fitz-gov is English-only; multilingual variants are a v3+ consideration.

**Safety axis:** the false-trustworthy rate is the production safety metric (a case wrongly classified as `TRUSTWORTHY` is the dangerous error — the system would confidently surface a hallucinated or unsupported answer). Threshold calibration is tuned to keep this rate at or below the fitz-sage baseline (5.7%).

---

## Citation

```bibtex

@misc{pyrrho_v1_2026,

  title  = { pyrrho-modernbert-base-v1 },

  author = { Yan Fitzner },

  year   = { 2026 },

  url    = { https://huggingface.co/yafitzdev/pyrrho-modernbert-base-v1 },

}

```

## License

Apache 2.0 — see [LICENSE](https://github.com/yafitzdev/pyrrho/blob/main/LICENSE).

## Related projects

- [**fitz-sage**](https://github.com/yafitzdev/fitz-sage) — production RAG library that uses this model.
- [**fitz-gov**](https://github.com/yafitzdev/fitz-gov) — the benchmark dataset.
- [**pyrrho**](https://github.com/yafitzdev/pyrrho) — training code and roadmap for the full model family.