File size: 6,111 Bytes
d02d00f
 
 
86198ed
d02d00f
86198ed
 
 
d02d00f
 
86198ed
d02d00f
86198ed
d02d00f
e67936d
 
 
d02d00f
 
86198ed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1a57d7b
86198ed
 
1a57d7b
86198ed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a603f25
86198ed
a603f25
 
1a57d7b
a603f25
 
 
86198ed
a603f25
86198ed
a603f25
 
 
86198ed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a603f25
86198ed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a603f25
 
86198ed
a603f25
 
86198ed
a603f25
 
 
 
 
86198ed
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
---
library_name: rule-inducer
license: mit
pipeline_tag: tabular-classification
tags:
- rule-induction
- neuro-symbolic
- logic-programming
- ilp
- interpretability
- zero-shot
- pytorch
- model_hub_mixin
- pytorch_model_hub_mixin
arxiv: 2605.04916
papers:
- 2605.04916
---

# Neural Rule Inducer (NRI)

**Zero-shot induction of interpretable DNF rules from Boolean examples.**

NRI is a pretrained neural model that, given a small set of labelled Boolean examples for a *new* binary classification task, produces an interpretable disjunctive normal form (DNF) rule that explains the labels — *without any task-specific fine-tuning*. Instead of encoding literal identities, it represents literals using domain-agnostic statistical properties (class-conditional rates, entropy, co-occurrence), which generalize across variable identities and counts.

- 📄 Paper: [A Foundation Model for Zero-Shot Logical Rule Induction (IJCAI 2026)](https://arxiv.org/abs/2605.04916)
- 💻 Code: [github.com/phuayj/neural-rule-inducer](https://github.com/phuayj/neural-rule-inducer)
- 🏷️ License: MIT

## Model details

| Field | Value |
| --- | --- |
| Architecture | Statistical literal encoder + parallel slot-based set decoder + t-norm/t-conorm aggregator |
| Parameters | ≈8.92 M |
| Output | Interpretable DNF rule (T_max=8 clauses × K_max=4 literals each) |
| Training data | Synthetic Boolean DNF episodes (no real-world labels) |
| Training compute | 500 steps, batch size 8192, 1 × NVIDIA RTX 6000 Pro (96 GB), ≈2.5 minutes |
| Seed | 42 |

The model is **pretrained**, not fine-tuned. It performs rule induction zero-shot at inference time on previously unseen tasks.

## Quickstart

```bash
pip install torch huggingface_hub
# Then install the rule_inducer package from the GitHub repo:
git clone https://github.com/phuayj/neural-rule-inducer.git
cd neural-rule-inducer
pip install -e .
```

```python
import torch
from rule_inducer import RuleInducer

model = RuleInducer.from_pretrained("phuayj/neural-rule-inducer")
model.eval()
# See evaluate_uci.py for the full episode-construction and inference loop.
```

For evaluating on tabular datasets in the UCI format (`X_bool.npy` of shape `[M, N]` in `{0, 1, NaN}` and `y.npy` of shape `[M]`):

```bash
# Download the legacy .pt checkpoint with full training state (optimizer, scheduler, RNG) from this repo:
python -c "from huggingface_hub import hf_hub_download; print(hf_hub_download('phuayj/neural-rule-inducer', 'checkpoint_best.pt'))"

# Use the original evaluation script:
python evaluate_uci.py --checkpoint <downloaded_path> --data-dir data/uci --all
```

## Reported performance

NRI is evaluated zero-shot on 14 UCI tabular benchmarks. **Direct comparison between this checkpoint and the paper's Table 1 number is not apples-to-apples**, because the two use different evaluation protocols:

| Setting | Eval protocol | Seeds | Mean acc. |
| --- | --- | --- | --- |
| **This checkpoint (release reference)** | 5-fold CV; **1 fold (≈20%) used as support, 4 folds (≈80%) as query**; no subsampling | 1 (seed 42) | **75.60 %** |
| **Paper Table 1** | 5-fold CV; **train portion subsampled to 5% before induction** (≈4% of total as support, 20% as query) | 10 | 69.7 % ± 12.0 |

The released checkpoint has **roughly 5× more support data per fold** than the paper's protocol, which is the dominant reason its UCI accuracy is higher (+5.9 pp) than the paper's 69.7 %. The paper's protocol deliberately targets a low-data regime where zero-shot transfer is most valuable.

To reproduce the paper's protocol exactly, you need the (private) full evaluation harness with `train_percentage=5.0` subsampling; the public `evaluate_uci.py` shipped with the GitHub repo uses the simpler 1-fold-as-support setup shown above. We plan to add `--train-percentage` support to the public script in a future release.

Per-dataset accuracies for this checkpoint under the public protocol:

| Dataset | This checkpoint (1 seed, 20% support) | Paper Table 1 (10 seeds, 5%-subsampled support) |
| --- | --- | --- |
| adult | 65.01 | 69.6 ± 4.4 |
| breast-cancer-wisconsin | 91.42 | 88.3 ± 0.3 |
| car | 71.07 | 51.2 ± 5.4 |
| credit-approval | 85.11 | 71.5 ± 7.3 |
| diabetes | 69.50 | 68.0 ± 2.4 |
| german-credit | 69.58 | 59.8 ± 3.8 |
| hepatitis | 66.78 | 55.9 ± 3.7 |
| ionosphere | 74.93 | 62.8 ± 3.9 |
| kr-vs-kp | 67.77 | 72.3 ± 5.3 |
| mushroom | 89.40 | 87.8 ± 3.9 |
| nursery | 74.56 | 71.3 ± 4.3 |
| spambase | 70.71 | 71.9 ± 0.7 |
| tic-tac-toe | 69.94 | 56.6 ± 2.5 |
| vote | 92.59 | 88.3 ± 1.8 |
| **Mean** | **75.60** | **69.7 ± 12.0** |

(Per-dataset paper numbers are mean ± std across 10 seeds; the ±12.0 on the bottom row is std *across the 14 datasets*, not across seeds.)

## Files in this repo

| File | Purpose |
| --- | --- |
| `pytorch_model.bin` | Inference weights (loaded by `RuleInducer.from_pretrained`) |
| `config.json` | Model architecture config |
| `checkpoint_best.pt` | Full training-state checkpoint (optimizer, scheduler, RNG, metrics) for resumption or audit |

## Limitations

- Inputs are assumed to be **Boolean** (`{0, 1}` plus optional `NaN` for missing values). Continuous features must be binarized first; the paper uses median thresholding.
- Designed for **binary classification**. Multi-class is handled by one-vs-rest at evaluation time, not at the rule level.
- The model induces **DNF** rules; tasks that require non-DNF representations (e.g. recursive predicates, arithmetic) are out of scope.
- Performance on a single dataset is sensitive to the support/query split and the binarization choice.

## Citation

The paper has been accepted at IJCAI 2026; the proceedings reference is not yet finalized. Please use the arXiv entry for now and update once the IJCAI BibTeX is published.

```bibtex
@misc{Phua2026NRI,
  title  = {A Foundation Model for Zero-Shot Logical Rule Induction},
  author = {Phua, Yin Jun},
  year   = {2026},
  eprint = {2605.04916},
  archivePrefix = {arXiv},
  primaryClass  = {cs.LG},
  note   = {To appear at IJCAI 2026; full proceedings citation TBD}
}
```