File size: 6,111 Bytes
d02d00f 86198ed d02d00f 86198ed d02d00f 86198ed d02d00f 86198ed d02d00f e67936d d02d00f 86198ed 1a57d7b 86198ed 1a57d7b 86198ed a603f25 86198ed a603f25 1a57d7b a603f25 86198ed a603f25 86198ed a603f25 86198ed a603f25 86198ed a603f25 86198ed a603f25 86198ed a603f25 86198ed | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 | ---
library_name: rule-inducer
license: mit
pipeline_tag: tabular-classification
tags:
- rule-induction
- neuro-symbolic
- logic-programming
- ilp
- interpretability
- zero-shot
- pytorch
- model_hub_mixin
- pytorch_model_hub_mixin
arxiv: 2605.04916
papers:
- 2605.04916
---
# Neural Rule Inducer (NRI)
**Zero-shot induction of interpretable DNF rules from Boolean examples.**
NRI is a pretrained neural model that, given a small set of labelled Boolean examples for a *new* binary classification task, produces an interpretable disjunctive normal form (DNF) rule that explains the labels — *without any task-specific fine-tuning*. Instead of encoding literal identities, it represents literals using domain-agnostic statistical properties (class-conditional rates, entropy, co-occurrence), which generalize across variable identities and counts.
- 📄 Paper: [A Foundation Model for Zero-Shot Logical Rule Induction (IJCAI 2026)](https://arxiv.org/abs/2605.04916)
- 💻 Code: [github.com/phuayj/neural-rule-inducer](https://github.com/phuayj/neural-rule-inducer)
- 🏷️ License: MIT
## Model details
| Field | Value |
| --- | --- |
| Architecture | Statistical literal encoder + parallel slot-based set decoder + t-norm/t-conorm aggregator |
| Parameters | ≈8.92 M |
| Output | Interpretable DNF rule (T_max=8 clauses × K_max=4 literals each) |
| Training data | Synthetic Boolean DNF episodes (no real-world labels) |
| Training compute | 500 steps, batch size 8192, 1 × NVIDIA RTX 6000 Pro (96 GB), ≈2.5 minutes |
| Seed | 42 |
The model is **pretrained**, not fine-tuned. It performs rule induction zero-shot at inference time on previously unseen tasks.
## Quickstart
```bash
pip install torch huggingface_hub
# Then install the rule_inducer package from the GitHub repo:
git clone https://github.com/phuayj/neural-rule-inducer.git
cd neural-rule-inducer
pip install -e .
```
```python
import torch
from rule_inducer import RuleInducer
model = RuleInducer.from_pretrained("phuayj/neural-rule-inducer")
model.eval()
# See evaluate_uci.py for the full episode-construction and inference loop.
```
For evaluating on tabular datasets in the UCI format (`X_bool.npy` of shape `[M, N]` in `{0, 1, NaN}` and `y.npy` of shape `[M]`):
```bash
# Download the legacy .pt checkpoint with full training state (optimizer, scheduler, RNG) from this repo:
python -c "from huggingface_hub import hf_hub_download; print(hf_hub_download('phuayj/neural-rule-inducer', 'checkpoint_best.pt'))"
# Use the original evaluation script:
python evaluate_uci.py --checkpoint <downloaded_path> --data-dir data/uci --all
```
## Reported performance
NRI is evaluated zero-shot on 14 UCI tabular benchmarks. **Direct comparison between this checkpoint and the paper's Table 1 number is not apples-to-apples**, because the two use different evaluation protocols:
| Setting | Eval protocol | Seeds | Mean acc. |
| --- | --- | --- | --- |
| **This checkpoint (release reference)** | 5-fold CV; **1 fold (≈20%) used as support, 4 folds (≈80%) as query**; no subsampling | 1 (seed 42) | **75.60 %** |
| **Paper Table 1** | 5-fold CV; **train portion subsampled to 5% before induction** (≈4% of total as support, 20% as query) | 10 | 69.7 % ± 12.0 |
The released checkpoint has **roughly 5× more support data per fold** than the paper's protocol, which is the dominant reason its UCI accuracy is higher (+5.9 pp) than the paper's 69.7 %. The paper's protocol deliberately targets a low-data regime where zero-shot transfer is most valuable.
To reproduce the paper's protocol exactly, you need the (private) full evaluation harness with `train_percentage=5.0` subsampling; the public `evaluate_uci.py` shipped with the GitHub repo uses the simpler 1-fold-as-support setup shown above. We plan to add `--train-percentage` support to the public script in a future release.
Per-dataset accuracies for this checkpoint under the public protocol:
| Dataset | This checkpoint (1 seed, 20% support) | Paper Table 1 (10 seeds, 5%-subsampled support) |
| --- | --- | --- |
| adult | 65.01 | 69.6 ± 4.4 |
| breast-cancer-wisconsin | 91.42 | 88.3 ± 0.3 |
| car | 71.07 | 51.2 ± 5.4 |
| credit-approval | 85.11 | 71.5 ± 7.3 |
| diabetes | 69.50 | 68.0 ± 2.4 |
| german-credit | 69.58 | 59.8 ± 3.8 |
| hepatitis | 66.78 | 55.9 ± 3.7 |
| ionosphere | 74.93 | 62.8 ± 3.9 |
| kr-vs-kp | 67.77 | 72.3 ± 5.3 |
| mushroom | 89.40 | 87.8 ± 3.9 |
| nursery | 74.56 | 71.3 ± 4.3 |
| spambase | 70.71 | 71.9 ± 0.7 |
| tic-tac-toe | 69.94 | 56.6 ± 2.5 |
| vote | 92.59 | 88.3 ± 1.8 |
| **Mean** | **75.60** | **69.7 ± 12.0** |
(Per-dataset paper numbers are mean ± std across 10 seeds; the ±12.0 on the bottom row is std *across the 14 datasets*, not across seeds.)
## Files in this repo
| File | Purpose |
| --- | --- |
| `pytorch_model.bin` | Inference weights (loaded by `RuleInducer.from_pretrained`) |
| `config.json` | Model architecture config |
| `checkpoint_best.pt` | Full training-state checkpoint (optimizer, scheduler, RNG, metrics) for resumption or audit |
## Limitations
- Inputs are assumed to be **Boolean** (`{0, 1}` plus optional `NaN` for missing values). Continuous features must be binarized first; the paper uses median thresholding.
- Designed for **binary classification**. Multi-class is handled by one-vs-rest at evaluation time, not at the rule level.
- The model induces **DNF** rules; tasks that require non-DNF representations (e.g. recursive predicates, arithmetic) are out of scope.
- Performance on a single dataset is sensitive to the support/query split and the binarization choice.
## Citation
The paper has been accepted at IJCAI 2026; the proceedings reference is not yet finalized. Please use the arXiv entry for now and update once the IJCAI BibTeX is published.
```bibtex
@misc{Phua2026NRI,
title = {A Foundation Model for Zero-Shot Logical Rule Induction},
author = {Phua, Yin Jun},
year = {2026},
eprint = {2605.04916},
archivePrefix = {arXiv},
primaryClass = {cs.LG},
note = {To appear at IJCAI 2026; full proceedings citation TBD}
}
```
|