VynFi JE Fraud GNN β GraphSAGE edge classifier + GAE node anomaly scorer
Trained on the v5.9.0 Method-A accounting network from
VynFi/vynfi-journal-entries-1m.
Two complementary models in one bundle:
| Task | Model | Test AUC-ROC | Test AUC-PR | Notes |
|---|---|---|---|---|
| Edge fraud classification (supervised) | GraphSAGE β edge head | 0.9136 | 0.7949 | Beats LR baseline by +0.13 AUC pts (LR already strong because weekend + round-dollar features are highly discriminative). |
| Edge anomaly scoring (unsupervised) | Attribute-reconstruction GAE | 0.6540 | 0.1434 | Pure unsupervised β no is_fraud/is_anomaly labels seen at train time. Surfaces edges whose attributes are unusual given their structural neighborhood. |
Per-business-process breakdown (edge fraud classifier, test split)
| Process | AUC-ROC | AUC-PR | F1 | n |
|---|---|---|---|---|
| P2P | 0.9289 | 0.8146 | 0.8041 | 2,835 |
| O2C | 0.8965 | 0.7660 | 0.7423 | 3,155 |
| R2R | 0.9301 | 0.8113 | 0.8000 | 1,895 |
| H2R | 0.8859 | 0.7517 | 0.7523 | 914 |
| A2R | 0.9512 | 0.9273 | 0.9565 | 450 |
Architecture
Fraud classifier β EdgeFraudGNN:
- 2-layer GraphSAGE encoder (mean aggregator) β 64-dim node embeddings.
- Edge head: MLP on
concat(emb_src, emb_dst, edge_attr)β sigmoid. - BCE loss with positive-class weight β 16.3 (5.79 % fraud rate).
Anomaly scorer β AttrGAE:
- Same 2-layer GraphSAGE encoder.
- MLP decoder predicts
edge_attrfromconcat(z_src, z_dst). - MSE loss; per-edge reconstruction error ranks anomalous edges (high error = unusual attributes given structural context).
Both models share the same node feature space (17 dims): account-type one-hot Β· structural flags Β· hierarchy level Β· log-aggregated in/out flows.
Edge features (22 dims): log-amount Β· is-round-dollar Β· per-level round flags Β· confidence Β· business-process one-hot Β· day-of-year sin/cos Β· week-of-year sin/cos Β· day-of-week sin/cos Β· is-weekend.
Quick start
from huggingface_hub import snapshot_download
from scripts.ml.inference import load_bundle
local_dir = snapshot_download(repo_id="VynFi/je-fraud-gnn")
bundle = load_bundle(local_dir)
# Predict fraud probability for one or more edges
probs = bundle.predict_fraud(
from_account=["1000", "5000"],
to_account=["2000", "4000"],
amount=[7432.89, 25000.00], # second is a "round" amount
business_process=["P2P", "O2C"],
posting_date=["2024-03-15", "2024-08-10"],
)
print(probs) # array([0.13, 0.99]) β round amount β strong fraud signal
# Per-edge anomaly score (high MSE = unusual attribute combination)
mse = bundle.anomaly_score_edges(
from_account=["1000", "5000"],
to_account=["2000", "4000"],
amount=[7432.89, 25000.00],
business_process=["P2P", "O2C"],
posting_date=["2024-03-15", "2024-08-10"],
)
print(mse)
The scripts/ml/inference.py module is shipped in the
engine repo.
Training data
Sourced from
VynFi/vynfi-journal-entries-1m
v5.9.0:
- 499 GL accounts (after dedupe of 4 conflicting
account_numberrows in the COA) - 61,656 Method-A edges (one edge per 2-line journal entry)
- 5.79 % fraud rate (3,571 / 61,656)
- 6.52 % anomaly rate
- Stratified 70/15/15 train/val/test split on
is_fraud(seed = 20260509) - Generated under the v5.9.0 release tag (ChaCha8 PRNG, platform-stable)
Why does the GraphSAGE encoder add only marginal lift over LR?
Honest answer: the synthetic fraud bias in DataSynth v5.x writes strong, local signals into edge attributes:
fraud_bias.weekend_bias=0.30β 41 % of fraud edges land on Sat/Sun vs 0.5 % of non-fraud (77Γ lift)fraud_bias.round_dollar_bias=0.40β 55 % of fraud edges hit a $1K/$5K/$10K/$25K/$50K/$100K canonical level vs 0.14 % (378Γ lift)
A LogisticRegression with day-of-week + round-dollar features already gets to AUC 0.912 β there's not much room left for the graph encoder to add value on the supervised task. The GraphSAGE encoder adds +0.13 AUC pts and +0.84 AUC-PR pts; the per-process breakdown is where it shines (A2R stretches to 0.95 AUC).
Where the graph contribution does show up:
- Unsupervised anomaly detection. The attribute-reconstruction GAE reaches AUC-ROC 0.654 on edge-level anomaly with no labels at train time β the structural prior is doing the work.
- Top-K anomalous accounts. The GAE's per-node aggregated MSE (mean across incident test edges) ranks accounts by structural weirdness; precision@10 = 0.60 against the median anomaly-fraction threshold.
For deployment scenarios where you have crisp labels and fraud patterns are local to single transactions, an LR baseline may be competitive. For labelless or graph-context fraud (multi-hop laundering, ring transactions), the GNN signal is the differentiator.
Limitations
- Trained on a single 1M-JE generator run. Generalisation to other v5.9.0 datasets has not been evaluated.
is_fraudlabels come from DataSynth's fraud-bias mechanism β they reflect known bias signatures (weekend / round-dollar / off-hours / post-close), not the full universe of real-world fraud patterns.- Account vocabulary is fixed at the 499 nodes in the published COA.
Inference on unseen
account_numbervalues raisesValueError. - Per-node anomaly AUC is close to random (0.48) β the per-edge signal is the load-bearing one. For ranking accounts, use precision@K instead of AUC.
Reproducibility
git clone https://github.com/mivertowski/SyntheticData.git
cd SyntheticData
pip install -r requirements-ml.txt
python -m scripts.ml.build_je_pyg_dataset --output data/ml/je_pyg_v1.pt --seed 20260509
python -m scripts.ml.train_je_fraud_gnn --epochs 60
python -m scripts.ml.train_je_anomaly_gae --epochs 80
python -m scripts.ml.package_for_hf
Citation
@misc{ivertowski2026datasynth,
author = {Ivertowski, Michael},
title = {{DataSynth}: Reference Knowledge Graphs for Enterprise
Audit Analytics through Synthetic Data Generation
with Provable Statistical Properties},
year = {2026},
month = {April},
howpublished = {SSRN Working Paper},
url = {https://ssrn.com/abstract=6538639}
}
License
Apache-2.0.
Related
VynFi/vynfi-journal-entries-1mβ training datasetVynFi/accounting-network-explorerβ interactive class-level network viewerVynFi/fraud-gnn-demoβ Gradio inference Space (companion)- Engine repo Β· SSRN paper