VynFi JE Fraud GNN β€” GraphSAGE edge classifier + GAE node anomaly scorer

Trained on the v5.9.0 Method-A accounting network from VynFi/vynfi-journal-entries-1m. Two complementary models in one bundle:

Task Model Test AUC-ROC Test AUC-PR Notes
Edge fraud classification (supervised) GraphSAGE β†’ edge head 0.9136 0.7949 Beats LR baseline by +0.13 AUC pts (LR already strong because weekend + round-dollar features are highly discriminative).
Edge anomaly scoring (unsupervised) Attribute-reconstruction GAE 0.6540 0.1434 Pure unsupervised β€” no is_fraud/is_anomaly labels seen at train time. Surfaces edges whose attributes are unusual given their structural neighborhood.

Per-business-process breakdown (edge fraud classifier, test split)

Process AUC-ROC AUC-PR F1 n
P2P 0.9289 0.8146 0.8041 2,835
O2C 0.8965 0.7660 0.7423 3,155
R2R 0.9301 0.8113 0.8000 1,895
H2R 0.8859 0.7517 0.7523 914
A2R 0.9512 0.9273 0.9565 450

Architecture

Fraud classifier β€” EdgeFraudGNN:

  • 2-layer GraphSAGE encoder (mean aggregator) β†’ 64-dim node embeddings.
  • Edge head: MLP on concat(emb_src, emb_dst, edge_attr) β†’ sigmoid.
  • BCE loss with positive-class weight β‰ˆ 16.3 (5.79 % fraud rate).

Anomaly scorer β€” AttrGAE:

  • Same 2-layer GraphSAGE encoder.
  • MLP decoder predicts edge_attr from concat(z_src, z_dst).
  • MSE loss; per-edge reconstruction error ranks anomalous edges (high error = unusual attributes given structural context).

Both models share the same node feature space (17 dims): account-type one-hot Β· structural flags Β· hierarchy level Β· log-aggregated in/out flows.

Edge features (22 dims): log-amount Β· is-round-dollar Β· per-level round flags Β· confidence Β· business-process one-hot Β· day-of-year sin/cos Β· week-of-year sin/cos Β· day-of-week sin/cos Β· is-weekend.

Quick start

from huggingface_hub import snapshot_download
from scripts.ml.inference import load_bundle

local_dir = snapshot_download(repo_id="VynFi/je-fraud-gnn")
bundle = load_bundle(local_dir)

# Predict fraud probability for one or more edges
probs = bundle.predict_fraud(
    from_account=["1000", "5000"],
    to_account=["2000", "4000"],
    amount=[7432.89, 25000.00],            # second is a "round" amount
    business_process=["P2P", "O2C"],
    posting_date=["2024-03-15", "2024-08-10"],
)
print(probs)  # array([0.13, 0.99]) β€” round amount β†’ strong fraud signal

# Per-edge anomaly score (high MSE = unusual attribute combination)
mse = bundle.anomaly_score_edges(
    from_account=["1000", "5000"],
    to_account=["2000", "4000"],
    amount=[7432.89, 25000.00],
    business_process=["P2P", "O2C"],
    posting_date=["2024-03-15", "2024-08-10"],
)
print(mse)

The scripts/ml/inference.py module is shipped in the engine repo.

Training data

Sourced from VynFi/vynfi-journal-entries-1m v5.9.0:

  • 499 GL accounts (after dedupe of 4 conflicting account_number rows in the COA)
  • 61,656 Method-A edges (one edge per 2-line journal entry)
  • 5.79 % fraud rate (3,571 / 61,656)
  • 6.52 % anomaly rate
  • Stratified 70/15/15 train/val/test split on is_fraud (seed = 20260509)
  • Generated under the v5.9.0 release tag (ChaCha8 PRNG, platform-stable)

Why does the GraphSAGE encoder add only marginal lift over LR?

Honest answer: the synthetic fraud bias in DataSynth v5.x writes strong, local signals into edge attributes:

fraud_bias.weekend_bias=0.30 β†’ 41 % of fraud edges land on Sat/Sun vs 0.5 % of non-fraud (77Γ— lift) fraud_bias.round_dollar_bias=0.40 β†’ 55 % of fraud edges hit a $1K/$5K/$10K/$25K/$50K/$100K canonical level vs 0.14 % (378Γ— lift)

A LogisticRegression with day-of-week + round-dollar features already gets to AUC 0.912 β€” there's not much room left for the graph encoder to add value on the supervised task. The GraphSAGE encoder adds +0.13 AUC pts and +0.84 AUC-PR pts; the per-process breakdown is where it shines (A2R stretches to 0.95 AUC).

Where the graph contribution does show up:

  • Unsupervised anomaly detection. The attribute-reconstruction GAE reaches AUC-ROC 0.654 on edge-level anomaly with no labels at train time β€” the structural prior is doing the work.
  • Top-K anomalous accounts. The GAE's per-node aggregated MSE (mean across incident test edges) ranks accounts by structural weirdness; precision@10 = 0.60 against the median anomaly-fraction threshold.

For deployment scenarios where you have crisp labels and fraud patterns are local to single transactions, an LR baseline may be competitive. For labelless or graph-context fraud (multi-hop laundering, ring transactions), the GNN signal is the differentiator.

Limitations

  • Trained on a single 1M-JE generator run. Generalisation to other v5.9.0 datasets has not been evaluated.
  • is_fraud labels come from DataSynth's fraud-bias mechanism β€” they reflect known bias signatures (weekend / round-dollar / off-hours / post-close), not the full universe of real-world fraud patterns.
  • Account vocabulary is fixed at the 499 nodes in the published COA. Inference on unseen account_number values raises ValueError.
  • Per-node anomaly AUC is close to random (0.48) β€” the per-edge signal is the load-bearing one. For ranking accounts, use precision@K instead of AUC.

Reproducibility

git clone https://github.com/mivertowski/SyntheticData.git
cd SyntheticData
pip install -r requirements-ml.txt
python -m scripts.ml.build_je_pyg_dataset --output data/ml/je_pyg_v1.pt --seed 20260509
python -m scripts.ml.train_je_fraud_gnn --epochs 60
python -m scripts.ml.train_je_anomaly_gae --epochs 80
python -m scripts.ml.package_for_hf

Citation

@misc{ivertowski2026datasynth,
  author       = {Ivertowski, Michael},
  title        = {{DataSynth}: Reference Knowledge Graphs for Enterprise
                  Audit Analytics through Synthetic Data Generation
                  with Provable Statistical Properties},
  year         = {2026},
  month        = {April},
  howpublished = {SSRN Working Paper},
  url          = {https://ssrn.com/abstract=6538639}
}

License

Apache-2.0.

Related

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train VynFi/je-fraud-gnn

Space using VynFi/je-fraud-gnn 1