phuayj commited on
Commit
a603f25
·
verified ·
1 Parent(s): e67936d

Fix performance disclosure: gap is from eval protocol (5x more support data), not seed variance; switch citation to TBD/arXiv

Browse files
Files changed (1) hide show
  1. README.md +21 -13
README.md CHANGED
@@ -71,16 +71,20 @@ python evaluate_uci.py --checkpoint <downloaded_path> --data-dir data/uci --all
71
 
72
  ## Reported performance
73
 
74
- Mean accuracy across 14 UCI tabular benchmarks under 5-fold stratified CV with one-vs-rest aggregation for multi-class tasks (paper Section 5).
75
 
76
- | Setting | Mean acc. |
77
- | --- | --- |
78
- | **This checkpoint (seed 42, this release)** | **75.60 %** |
79
- | Paper Table 1 (different seed/version) | 69.7 % ± 12.0 |
 
 
80
 
81
- Per-dataset breakdown for *this checkpoint* vs *paper Table 1*:
82
 
83
- | Dataset | This checkpoint | Paper Table 1 |
 
 
84
  | --- | --- | --- |
85
  | adult | 65.01 | 69.6 ± 4.4 |
86
  | breast-cancer-wisconsin | 91.42 | 88.3 ± 0.3 |
@@ -98,7 +102,7 @@ Per-dataset breakdown for *this checkpoint* vs *paper Table 1*:
98
  | vote | 92.59 | 88.3 ± 1.8 |
99
  | **Mean** | **75.60** | **69.7 ± 12.0** |
100
 
101
- **Why does this checkpoint exceed the paper number?** A 10-seed sweep on this codebase yields mean UCI accuracy in the range ≈65 % — 77.7 % (mean ≈73.5 %, std ≈4 %). The paper reports a single seed's number that happened to fall on the lower end of this distribution; the +5.9 pp gap is consistent with cross-seed variance and minor codebase evolution since paper submission. Reproducing the paper's exact number would require the paper's seed and code commit; the released checkpoint is the *cleaned reference* used to verify the public release end-to-end.
102
 
103
  ## Files in this repo
104
 
@@ -117,12 +121,16 @@ Per-dataset breakdown for *this checkpoint* vs *paper Table 1*:
117
 
118
  ## Citation
119
 
 
 
120
  ```bibtex
121
- @inproceedings{Phua2026NRI,
122
- title = {A Foundation Model for Zero-Shot Logical Rule Induction},
123
  author = {Phua, Yin Jun},
124
- booktitle = {Proceedings of the Thirty-Fifth International Joint Conference on Artificial Intelligence (IJCAI 2026)},
125
- year = {2026},
126
- note = {arXiv preprint: \url{https://arxiv.org/abs/2605.04916}}
 
 
127
  }
128
  ```
 
71
 
72
  ## Reported performance
73
 
74
+ NRI is evaluated zero-shot on 14 UCI tabular benchmarks. **Direct comparison between this checkpoint and the paper's Table 1 number is not apples-to-apples**, because the two use different evaluation protocols:
75
 
76
+ | Setting | Eval protocol | Seeds | Mean acc. |
77
+ | --- | --- | --- | --- |
78
+ | **This checkpoint (release reference)** | 5-fold CV; **1 fold (~20%) used as support, 4 folds (~80%) as query**; no subsampling | 1 (seed 42) | **75.60 %** |
79
+ | **Paper Table 1** | 5-fold CV; **train portion subsampled to 5% before induction** (≈4% of total as support, 20% as query) | 10 | 69.7 % ± 12.0 |
80
+
81
+ The released checkpoint has **roughly 5× more support data per fold** than the paper's protocol, which is the dominant reason its UCI accuracy is higher (+5.9 pp) than the paper's 69.7 %. The paper's protocol deliberately targets a low-data regime where zero-shot transfer is most valuable.
82
 
83
+ To reproduce the paper's protocol exactly, you need the (private) full evaluation harness with `train_percentage=5.0` subsampling; the public `evaluate_uci.py` shipped with the GitHub repo uses the simpler 1-fold-as-support setup shown above. We plan to add `--train-percentage` support to the public script in a future release.
84
 
85
+ Per-dataset accuracies for this checkpoint under the public protocol:
86
+
87
+ | Dataset | This checkpoint (1 seed, 20% support) | Paper Table 1 (10 seeds, 5%-subsampled support) |
88
  | --- | --- | --- |
89
  | adult | 65.01 | 69.6 ± 4.4 |
90
  | breast-cancer-wisconsin | 91.42 | 88.3 ± 0.3 |
 
102
  | vote | 92.59 | 88.3 ± 1.8 |
103
  | **Mean** | **75.60** | **69.7 ± 12.0** |
104
 
105
+ (Per-dataset paper numbers are mean ± std across 10 seeds; the ±12.0 on the bottom row is std *across the 14 datasets*, not across seeds.)
106
 
107
  ## Files in this repo
108
 
 
121
 
122
  ## Citation
123
 
124
+ The paper has been accepted at IJCAI 2026; the proceedings reference is not yet finalized. Please use the arXiv entry for now and update once the IJCAI BibTeX is published.
125
+
126
  ```bibtex
127
+ @misc{Phua2026NRI,
128
+ title = {A Foundation Model for Zero-Shot Logical Rule Induction},
129
  author = {Phua, Yin Jun},
130
+ year = {2026},
131
+ eprint = {2605.04916},
132
+ archivePrefix = {arXiv},
133
+ primaryClass = {cs.LG},
134
+ note = {To appear at IJCAI 2026; full proceedings citation TBD}
135
  }
136
  ```