Fix performance disclosure: gap is from eval protocol (5x more support data), not seed variance; switch citation to TBD/arXiv
Browse files
README.md
CHANGED
|
@@ -71,16 +71,20 @@ python evaluate_uci.py --checkpoint <downloaded_path> --data-dir data/uci --all
|
|
| 71 |
|
| 72 |
## Reported performance
|
| 73 |
|
| 74 |
-
|
| 75 |
|
| 76 |
-
| Setting | Mean acc. |
|
| 77 |
-
| --- | --- |
|
| 78 |
-
| **This checkpoint (
|
| 79 |
-
| Paper Table 1 (
|
|
|
|
|
|
|
| 80 |
|
| 81 |
-
|
| 82 |
|
| 83 |
-
|
|
|
|
|
|
|
| 84 |
| --- | --- | --- |
|
| 85 |
| adult | 65.01 | 69.6 ± 4.4 |
|
| 86 |
| breast-cancer-wisconsin | 91.42 | 88.3 ± 0.3 |
|
|
@@ -98,7 +102,7 @@ Per-dataset breakdown for *this checkpoint* vs *paper Table 1*:
|
|
| 98 |
| vote | 92.59 | 88.3 ± 1.8 |
|
| 99 |
| **Mean** | **75.60** | **69.7 ± 12.0** |
|
| 100 |
|
| 101 |
-
|
| 102 |
|
| 103 |
## Files in this repo
|
| 104 |
|
|
@@ -117,12 +121,16 @@ Per-dataset breakdown for *this checkpoint* vs *paper Table 1*:
|
|
| 117 |
|
| 118 |
## Citation
|
| 119 |
|
|
|
|
|
|
|
| 120 |
```bibtex
|
| 121 |
-
@
|
| 122 |
-
title
|
| 123 |
author = {Phua, Yin Jun},
|
| 124 |
-
|
| 125 |
-
|
| 126 |
-
|
|
|
|
|
|
|
| 127 |
}
|
| 128 |
```
|
|
|
|
| 71 |
|
| 72 |
## Reported performance
|
| 73 |
|
| 74 |
+
NRI is evaluated zero-shot on 14 UCI tabular benchmarks. **Direct comparison between this checkpoint and the paper's Table 1 number is not apples-to-apples**, because the two use different evaluation protocols:
|
| 75 |
|
| 76 |
+
| Setting | Eval protocol | Seeds | Mean acc. |
|
| 77 |
+
| --- | --- | --- | --- |
|
| 78 |
+
| **This checkpoint (release reference)** | 5-fold CV; **1 fold (~20%) used as support, 4 folds (~80%) as query**; no subsampling | 1 (seed 42) | **75.60 %** |
|
| 79 |
+
| **Paper Table 1** | 5-fold CV; **train portion subsampled to 5% before induction** (≈4% of total as support, 20% as query) | 10 | 69.7 % ± 12.0 |
|
| 80 |
+
|
| 81 |
+
The released checkpoint has **roughly 5× more support data per fold** than the paper's protocol, which is the dominant reason its UCI accuracy is higher (+5.9 pp) than the paper's 69.7 %. The paper's protocol deliberately targets a low-data regime where zero-shot transfer is most valuable.
|
| 82 |
|
| 83 |
+
To reproduce the paper's protocol exactly, you need the (private) full evaluation harness with `train_percentage=5.0` subsampling; the public `evaluate_uci.py` shipped with the GitHub repo uses the simpler 1-fold-as-support setup shown above. We plan to add `--train-percentage` support to the public script in a future release.
|
| 84 |
|
| 85 |
+
Per-dataset accuracies for this checkpoint under the public protocol:
|
| 86 |
+
|
| 87 |
+
| Dataset | This checkpoint (1 seed, 20% support) | Paper Table 1 (10 seeds, 5%-subsampled support) |
|
| 88 |
| --- | --- | --- |
|
| 89 |
| adult | 65.01 | 69.6 ± 4.4 |
|
| 90 |
| breast-cancer-wisconsin | 91.42 | 88.3 ± 0.3 |
|
|
|
|
| 102 |
| vote | 92.59 | 88.3 ± 1.8 |
|
| 103 |
| **Mean** | **75.60** | **69.7 ± 12.0** |
|
| 104 |
|
| 105 |
+
(Per-dataset paper numbers are mean ± std across 10 seeds; the ±12.0 on the bottom row is std *across the 14 datasets*, not across seeds.)
|
| 106 |
|
| 107 |
## Files in this repo
|
| 108 |
|
|
|
|
| 121 |
|
| 122 |
## Citation
|
| 123 |
|
| 124 |
+
The paper has been accepted at IJCAI 2026; the proceedings reference is not yet finalized. Please use the arXiv entry for now and update once the IJCAI BibTeX is published.
|
| 125 |
+
|
| 126 |
```bibtex
|
| 127 |
+
@misc{Phua2026NRI,
|
| 128 |
+
title = {A Foundation Model for Zero-Shot Logical Rule Induction},
|
| 129 |
author = {Phua, Yin Jun},
|
| 130 |
+
year = {2026},
|
| 131 |
+
eprint = {2605.04916},
|
| 132 |
+
archivePrefix = {arXiv},
|
| 133 |
+
primaryClass = {cs.LG},
|
| 134 |
+
note = {To appear at IJCAI 2026; full proceedings citation TBD}
|
| 135 |
}
|
| 136 |
```
|