phuayj
/

neural-rule-inducer

@@ -71,16 +71,20 @@ python evaluate_uci.py --checkpoint <downloaded_path> --data-dir data/uci --all
 ## Reported performance
-Mean accuracy across 14 UCI tabular benchmarks under 5-fold stratified CV with one-vs-rest aggregation for multi-class tasks (paper Section 5).
-| Setting | Mean acc. |
-| --- | --- |
-| **This checkpoint (seed 42, this release)** | **75.60 %** |
-| Paper Table 1 (different seed/version) | 69.7 % ± 12.0 |
-Per-dataset breakdown for *this checkpoint* vs *paper Table 1*:
-| Dataset | This checkpoint | Paper Table 1 |
 | --- | --- | --- |
 | adult | 65.01 | 69.6 ± 4.4 |
 | breast-cancer-wisconsin | 91.42 | 88.3 ± 0.3 |
@@ -98,7 +102,7 @@ Per-dataset breakdown for *this checkpoint* vs *paper Table 1*:
 | vote | 92.59 | 88.3 ± 1.8 |
 | **Mean** | **75.60** | **69.7 ± 12.0** |
-**Why does this checkpoint exceed the paper number?** A 10-seed sweep on this codebase yields mean UCI accuracy in the range ≈65 % — 77.7 % (mean ≈73.5 %, std ≈4 %). The paper reports a single seed's number that happened to fall on the lower end of this distribution; the +5.9 pp gap is consistent with cross-seed variance and minor codebase evolution since paper submission. Reproducing the paper's exact number would require the paper's seed and code commit; the released checkpoint is the *cleaned reference* used to verify the public release end-to-end.
 ## Files in this repo
@@ -117,12 +121,16 @@ Per-dataset breakdown for *this checkpoint* vs *paper Table 1*:
 ## Citation
 ```bibtex
-@inproceedings{Phua2026NRI,
-  title = {A Foundation Model for Zero-Shot Logical Rule Induction},
   author = {Phua, Yin Jun},
-  booktitle = {Proceedings of the Thirty-Fifth International Joint Conference on Artificial Intelligence (IJCAI 2026)},
-  year = {2026},
-  note = {arXiv preprint: \url{https://arxiv.org/abs/2605.04916}}
 }
 ```

 ## Reported performance
+NRI is evaluated zero-shot on 14 UCI tabular benchmarks. **Direct comparison between this checkpoint and the paper's Table 1 number is not apples-to-apples**, because the two use different evaluation protocols:
+| Setting | Eval protocol | Seeds | Mean acc. |
+| --- | --- | --- | --- |
+| **This checkpoint (release reference)** | 5-fold CV; **1 fold (~20%) used as support, 4 folds (~80%) as query**; no subsampling | 1 (seed 42) | **75.60 %** |
+| **Paper Table 1** | 5-fold CV; **train portion subsampled to 5% before induction** (≈4% of total as support, 20% as query) | 10 | 69.7 % ± 12.0 |
+The released checkpoint has **roughly 5× more support data per fold** than the paper's protocol, which is the dominant reason its UCI accuracy is higher (+5.9 pp) than the paper's 69.7 %. The paper's protocol deliberately targets a low-data regime where zero-shot transfer is most valuable.
+To reproduce the paper's protocol exactly, you need the (private) full evaluation harness with `train_percentage=5.0` subsampling; the public `evaluate_uci.py` shipped with the GitHub repo uses the simpler 1-fold-as-support setup shown above. We plan to add `--train-percentage` support to the public script in a future release.
+Per-dataset accuracies for this checkpoint under the public protocol:
+| Dataset | This checkpoint (1 seed, 20% support) | Paper Table 1 (10 seeds, 5%-subsampled support) |
 | --- | --- | --- |
 | adult | 65.01 | 69.6 ± 4.4 |
 | breast-cancer-wisconsin | 91.42 | 88.3 ± 0.3 |
 | vote | 92.59 | 88.3 ± 1.8 |
 | **Mean** | **75.60** | **69.7 ± 12.0** |
+(Per-dataset paper numbers are mean ± std across 10 seeds; the ±12.0 on the bottom row is std *across the 14 datasets*, not across seeds.)
 ## Files in this repo
 ## Citation
+The paper has been accepted at IJCAI 2026; the proceedings reference is not yet finalized. Please use the arXiv entry for now and update once the IJCAI BibTeX is published.
 ```bibtex
+@misc{Phua2026NRI,
+  title  = {A Foundation Model for Zero-Shot Logical Rule Induction},
   author = {Phua, Yin Jun},
+  year   = {2026},
+  eprint = {2605.04916},
+  archivePrefix = {arXiv},
+  primaryClass  = {cs.LG},
+  note   = {To appear at IJCAI 2026; full proceedings citation TBD}
 }
 ```