yafitzdev commited on
Commit
54ab7af
·
verified ·
1 Parent(s): 74ebc6f

Cross-link to yafitzdev/fitz-gov dataset (now live on HF)

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -14,7 +14,7 @@ tags:
14
  - fitz-gov
15
  - pyrrho
16
  datasets:
17
- - fitz-gov
18
  metrics:
19
  - accuracy
20
  - f1
@@ -25,7 +25,7 @@ metrics:
25
 
26
  > Decide whether your retrieved sources support a confident answer, contradict each other, or simply don't contain it — **without an LLM call**.
27
 
28
- This is a fine-tune of [`answerdotai/ModernBERT-base`](https://huggingface.co/answerdotai/ModernBERT-base) on [fitz-gov](https://github.com/yafitzdev/fitz-gov) v5.1 for **3-class RAG governance classification**: given a `(query, retrieved contexts)` pair, predicts one of:
29
 
30
  | Verdict | Meaning |
31
  |---|---|
@@ -39,7 +39,7 @@ A drop-in replacement for the constraint+sklearn governance pipeline in [fitz-sa
39
 
40
  ## Results
41
 
42
- Validated on the [fitz-gov](https://github.com/yafitzdev/fitz-gov) v5.1 eval split (584 cases, stratified 20% hold-out from `tier1_core`). All numbers are **3-seed mean ± std** across seeds [42, 1337, 7].
43
 
44
  | Metric | pyrrho v1 | fitz-sage v0.11 (sklearn baseline) | Δ |
45
  |---|---|---|---|
@@ -142,7 +142,7 @@ if pred == 2 and probs[2] < TAU: # TRUSTWORTHY id is 2
142
  | Hardware | NVIDIA RTX 5090 (Blackwell sm_120) |
143
  | Training time | ~80–500 s per run depending on GPU contention |
144
 
145
- Training data: fitz-gov v5.1 `tier1_core`, stratified 80/20 split by `(label, difficulty)` for train/eval. The 60-case `tier0_sanity` set is held out separately as a noise-prone diagnostic.
146
 
147
  ---
148
 
 
14
  - fitz-gov
15
  - pyrrho
16
  datasets:
17
+ - yafitzdev/fitz-gov
18
  metrics:
19
  - accuracy
20
  - f1
 
25
 
26
  > Decide whether your retrieved sources support a confident answer, contradict each other, or simply don't contain it — **without an LLM call**.
27
 
28
+ This is a fine-tune of [`answerdotai/ModernBERT-base`](https://huggingface.co/answerdotai/ModernBERT-base) on [fitz-gov](https://github.com/yafitzdev/fitz-gov) V5.1 for **3-class RAG governance classification**: given a `(query, retrieved contexts)` pair, predicts one of:
29
 
30
  | Verdict | Meaning |
31
  |---|---|
 
39
 
40
  ## Results
41
 
42
+ Validated on the [fitz-gov](https://github.com/yafitzdev/fitz-gov) V5.1 eval split (584 cases, stratified 20% hold-out from `tier1_core`). All numbers are **3-seed mean ± std** across seeds [42, 1337, 7].
43
 
44
  | Metric | pyrrho v1 | fitz-sage v0.11 (sklearn baseline) | Δ |
45
  |---|---|---|---|
 
142
  | Hardware | NVIDIA RTX 5090 (Blackwell sm_120) |
143
  | Training time | ~80–500 s per run depending on GPU contention |
144
 
145
+ Training data: fitz-gov V5.1 `tier1_core`, stratified 80/20 split by `(label, difficulty)` for train/eval. The 60-case `tier0_sanity` set is held out separately as a noise-prone diagnostic.
146
 
147
  ---
148