Cross-link to yafitzdev/fitz-gov dataset (now live on HF)

54ab7af verified 3 days ago

8.4 kB

	---
	license: apache-2.0
	library_name: transformers
	pipeline_tag: text-classification
	language:
	- en
	base_model: answerdotai/ModernBERT-base
	tags:
	- rag
	- governance
	- hallucination-detection
	- epistemic-honesty
	- classification
	- fitz-gov
	- pyrrho
	datasets:
	- yafitzdev/fitz-gov
	metrics:
	- accuracy
	- f1
	- false-trustworthy-rate
	---

	# pyrrho-modernbert-base-v1

	> Decide whether your retrieved sources support a confident answer, contradict each other, or simply don't contain it — without an LLM call.

	This is a fine-tune of [`answerdotai/ModernBERT-base`](https://huggingface.co/answerdotai/ModernBERT-base) on [fitz-gov](https://github.com/yafitzdev/fitz-gov) V5.1 for 3-class RAG governance classification: given a `(query, retrieved contexts)` pair, predicts one of:

	\| Verdict \| Meaning \|
	\|---\|---\|
	\| `ABSTAIN` \| The sources do not contain enough information to answer. \|
	\| `DISPUTED` \| The sources contradict each other on the answer. \|
	\| `TRUSTWORTHY` \| The sources consistently and sufficiently support an answer. \|

	A drop-in replacement for the constraint+sklearn governance pipeline in [fitz-sage](https://github.com/yafitzdev/fitz-sage). Single forward pass, ~30 ms on CPU after INT8 ONNX quantization, no external LLM dependency.

	---

	## Results

	Validated on the [fitz-gov](https://github.com/yafitzdev/fitz-gov) V5.1 eval split (584 cases, stratified 20% hold-out from `tier1_core`). All numbers are 3-seed mean ± std across seeds [42, 1337, 7].

	\| Metric \| pyrrho v1 \| fitz-sage v0.11 (sklearn baseline) \| Δ \|
	\|---\|---\|---\|---\|
	\| Overall accuracy (calibrated) \| 86.13 ± 0.86 \| 78.7 \| +7.43 \|
	\| False-trustworthy rate (safety) \| 5.27 ± 0.21 \| 5.7 \| -0.43 (safer) \|
	\| Trustworthy recall \| 79.38 ± 1.64 \| 70.0 \| +9.38 \|
	\| Disputed recall \| 94.81 ± 1.28 \| 86.1 \| +8.71 \|
	\| Abstain recall \| 92.94 ± 1.11 \| 86.5 \| +6.44 \|
	\| Macro F1 \| 86.10 ± 0.80 \| n/a \| — \|

	---

	## Known limitations

	1. Multi-source-convergence cases can be misclassified as DISPUTED. When multiple authoritative sources state the same fact with slight numerical variation that falls within measurement tolerance (e.g., 4 climate agencies citing 1.09–1.20 °C of warming, or NIST and IUPAC both giving the speed of light), the model occasionally classifies the case as DISPUTED with high confidence. On the relevant fitz-gov subcategory (`multi_source_convergence`, n=7) the error rate is ~57%. A v2 release with augmented training data targeting this pattern is planned.

	2. Short, direct factual contexts can trigger over-abstention. Smoke-test example: query "When was the iPhone released?" + a single-sentence context confirming June 29, 2007 → predicted `ABSTAIN` with P(ABSTAIN)=0.92. The model was trained on 62.7% hard tier1 cases (rich methodological contexts), so it underweights the short-clean-answer pattern. Production RAG chunks (typically 200–500 chars) are tier1-like and largely unaffected.

	---

	## Usage

	### Direct (transformers)

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	tokenizer = AutoTokenizer.from_pretrained("yafitzdev/pyrrho-modernbert-base-v1")
	model = AutoModelForSequenceClassification.from_pretrained("yafitzdev/pyrrho-modernbert-base-v1").eval()

	query = "Has the company achieved profitability?"
	contexts = [
	"The company posted its first profitable quarter, with net income of $4 million.",
	"The company recorded a quarterly loss of $12 million, the third consecutive losing quarter.",
	]

	# Build the input the same way training data was formatted
	text = f"Question: {query}\n\nSources:\n" + "\n".join(
	f"[{i}] {c}" for i, c in enumerate(contexts, start=1)
	)

	enc = tokenizer(text, truncation=True, max_length=4096, return_tensors="pt")
	with torch.no_grad():
	logits = model(**enc).logits[0]
	probs = torch.softmax(logits, dim=-1).numpy()
	labels = ["ABSTAIN", "DISPUTED", "TRUSTWORTHY"]
	print(f"Predicted: {labels[int(probs.argmax())]}")
	print(f"Probs : A={probs[0]:.3f} D={probs[1]:.3f} T={probs[2]:.3f}")
	```

	### CPU-optimized (ONNX + INT8)

	For production CPU inference at ~30 ms / case, load the INT8 ONNX variant via `optimum`:

	```python
	from optimum.onnxruntime import ORTModelForSequenceClassification
	from transformers import AutoTokenizer

	tokenizer = AutoTokenizer.from_pretrained("yafitzdev/pyrrho-modernbert-base-v1")
	model = ORTModelForSequenceClassification.from_pretrained(
	"yafitzdev/pyrrho-modernbert-base-v1",
	file_name="model_quantized.onnx",
	)
	# Same input format as above...
	```

	### Calibrated decision rule

	The headline numbers above use threshold calibration on the TRUSTWORTHY softmax probability. To match the published numbers, fall back from `TRUSTWORTHY` to the runner-up class when `P(TRUSTWORTHY) < tau`. The per-seed selected `tau` varied across runs (0.34–0.62); the safest default is `tau = 0.50`.

	```python
	TAU = 0.50
	pred = int(probs.argmax())
	if pred == 2 and probs[2] < TAU: # TRUSTWORTHY id is 2
	pred = int(probs[:2].argmax()) # fall back to runner-up between ABSTAIN/DISPUTED
	```

	---

	## Training

	\| Hyperparameter \| Value \|
	\|---\|---\|
	\| Base model \| `answerdotai/ModernBERT-base` \|
	\| Architecture \| ModernBERT (sequence classification head) \|
	\| Labels (3-class) \| ABSTAIN (0), DISPUTED (1), TRUSTWORTHY (2) \|
	\| Max sequence length \| 4096 tokens \|
	\| Epochs \| 5 (with early stopping, patience 2) \|
	\| Per-device batch size \| 16 \|
	\| Effective batch size \| 16 \|
	\| Learning rate \| 5e-5 \|
	\| LR scheduler \| cosine, 10% warmup \|
	\| Weight decay \| 0.01 \|
	\| Label smoothing \| 0.15 \|
	\| Class weights \| [2.3, 2.3, 1.0] (counters TRUSTWORTHY-over-prediction from 53% class imbalance) \|
	\| Loss \| Weighted cross-entropy + label smoothing \|
	\| Selection metric \| `ft_penalized_accuracy = accuracy - 3 * max(0, FT - 0.057)` \|
	\| Optimizer \| adamw_torch_fused (bf16) \|
	\| Hardware \| NVIDIA RTX 5090 (Blackwell sm_120) \|
	\| Training time \| ~80–500 s per run depending on GPU contention \|

	Training data: fitz-gov V5.1 `tier1_core`, stratified 80/20 split by `(label, difficulty)` for train/eval. The 60-case `tier0_sanity` set is held out separately as a noise-prone diagnostic.

	---

	## Dataset

	This model is trained and evaluated on [fitz-gov V5.1](https://github.com/yafitzdev/fitz-gov), a 2,980-case benchmark for RAG governance (epistemic honesty). The eval split (584 cases) is a stratified 20% hold-out from `tier1_core` (2,920 cases, 62.7% hard difficulty, 17 domains, 113+ subcategories).

	fitz-gov commit at training time: `3e1d22e22fdff726330a0d70503b07f73dacf817`

	---

	## Limitations & intended use

	Intended use: as a CPU-friendly governance head inside a RAG pipeline that needs to decide when to answer, abstain, or flag a dispute. Drop-in replacement for the constraint+sklearn cascade in [fitz-sage](https://github.com/yafitzdev/fitz-sage).

	Not intended for:
	- Generating answers (this is a classification model, not a generator).
	- Token-level hallucination localization (see [LettuceDetect](https://github.com/KRLabsOrg/LettuceDetect) for that — complementary use).
	- Languages other than English. fitz-gov is English-only; multilingual variants are a v3+ consideration.

	Safety axis: the false-trustworthy rate is the production safety metric (a case wrongly classified as `TRUSTWORTHY` is the dangerous error — the system would confidently surface a hallucinated or unsupported answer). Threshold calibration is tuned to keep this rate at or below the fitz-sage baseline (5.7%).

	---

	## Citation

	```bibtex
	@misc{pyrrho_v1_2026,
	title = { pyrrho-modernbert-base-v1 },
	author = { Yan Fitzner },
	year = { 2026 },
	url = { https://huggingface.co/yafitzdev/pyrrho-modernbert-base-v1 },
	}
	```

	## License

	Apache 2.0 — see [LICENSE](https://github.com/yafitzdev/pyrrho/blob/main/LICENSE).

	## Related projects

	- [fitz-sage](https://github.com/yafitzdev/fitz-sage) — production RAG library that uses this model.
	- [fitz-gov](https://github.com/yafitzdev/fitz-gov) — the benchmark dataset.
	- [pyrrho](https://github.com/yafitzdev/pyrrho) — training code and roadmap for the full model family.