yafitzdev
/

pyrrho-modernbert-base-v1

@@ -14,7 +14,7 @@ tags:
   - fitz-gov
   - pyrrho
 datasets:
-  - fitz-gov
 metrics:
   - accuracy
   - f1
@@ -25,7 +25,7 @@ metrics:
 > Decide whether your retrieved sources support a confident answer, contradict each other, or simply don't contain it — **without an LLM call**.
-This is a fine-tune of [`answerdotai/ModernBERT-base`](https://huggingface.co/answerdotai/ModernBERT-base) on [fitz-gov](https://github.com/yafitzdev/fitz-gov) v5.1 for **3-class RAG governance classification**: given a `(query, retrieved contexts)` pair, predicts one of:
 | Verdict | Meaning |
 |---|---|
@@ -39,7 +39,7 @@ A drop-in replacement for the constraint+sklearn governance pipeline in [fitz-sa
 ## Results
-Validated on the [fitz-gov](https://github.com/yafitzdev/fitz-gov) v5.1 eval split (584 cases, stratified 20% hold-out from `tier1_core`). All numbers are **3-seed mean ± std** across seeds [42, 1337, 7].
 | Metric | pyrrho v1 | fitz-sage v0.11 (sklearn baseline) | Δ |
 |---|---|---|---|
@@ -142,7 +142,7 @@ if pred == 2 and probs[2] < TAU:  # TRUSTWORTHY id is 2
 | Hardware | NVIDIA RTX 5090 (Blackwell sm_120) |
 | Training time | ~80–500 s per run depending on GPU contention |
-Training data: fitz-gov v5.1 `tier1_core`, stratified 80/20 split by `(label, difficulty)` for train/eval. The 60-case `tier0_sanity` set is held out separately as a noise-prone diagnostic.
 ---

   - fitz-gov
   - pyrrho
 datasets:
+  - yafitzdev/fitz-gov
 metrics:
   - accuracy
   - f1
 > Decide whether your retrieved sources support a confident answer, contradict each other, or simply don't contain it — **without an LLM call**.
+This is a fine-tune of [`answerdotai/ModernBERT-base`](https://huggingface.co/answerdotai/ModernBERT-base) on [fitz-gov](https://github.com/yafitzdev/fitz-gov) V5.1 for **3-class RAG governance classification**: given a `(query, retrieved contexts)` pair, predicts one of:
 | Verdict | Meaning |
 |---|---|
 ## Results
+Validated on the [fitz-gov](https://github.com/yafitzdev/fitz-gov) V5.1 eval split (584 cases, stratified 20% hold-out from `tier1_core`). All numbers are **3-seed mean ± std** across seeds [42, 1337, 7].
 | Metric | pyrrho v1 | fitz-sage v0.11 (sklearn baseline) | Δ |
 |---|---|---|---|
 | Hardware | NVIDIA RTX 5090 (Blackwell sm_120) |
 | Training time | ~80–500 s per run depending on GPU contention |
+Training data: fitz-gov V5.1 `tier1_core`, stratified 80/20 split by `(label, difficulty)` for train/eval. The 60-case `tier0_sanity` set is held out separately as a noise-prone diagnostic.
 ---