--- license: apache-2.0 tags: - temporal-graph-learning - fraud-detection - synthetic-data - benchmark - upi - causal-evaluation - matched-controls - neurips --- # Temporal Twins: A Matched-Control Benchmark for Temporal Fraud Detection Synthetic UPI-style temporal transaction benchmark where fraud and benign trajectories are matched on static and prefix-level summaries but differ in delayed event-order structure. ## Links - Dataset repository: [https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins](https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins) - Code repository: [https://huggingface.co/temporal-twins-benchmark/temporal-twins-code](https://huggingface.co/temporal-twins-benchmark/temporal-twins-code) - Croissant metadata URL: [https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins/raw/main/metadata/temporal_twins_croissant.json](https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins/raw/main/metadata/temporal_twins_croissant.json) - Paper or preprint: Not available during double-blind review; to be added after publication. ## Installation Recommended Python: `3.11+` ```bash pip install -r requirements.txt ``` If you prefer Conda: ```bash conda env create -f environment.yml conda activate temporal-twins ``` ## Repository Structure - `src/`: synthetic user, transaction, risk, fraud, graph, and temporal benchmark generation code - `models/`: SeqGRU, static baselines, audit/probe models, and temporal GNN wrappers - `experiments/`: deterministic benchmark runner and matched-prefix evaluation utilities - `config/`: base YAML configs used by the experiment runner - `configs/`: release-facing config snapshots for calibration and paper-suite reproduction - `docs/`: determinism and supporting documentation - `metadata/`: MLCommons Croissant metadata and validation notes - `results/`: lightweight frozen paper-suite summaries and interpretation notes ## Quick Smoke Test ```bash PYTHONPATH=. python3 experiments/run_all.py \ --fast \ --seed 0 \ --benchmark-mode temporal_twins_oracle_calib \ --experiments audit \ --device cpu ``` ## Exact Paper-Scale Reproduction The checked-in CLI exposes `--benchmark-mode`, `--seed`, `--seeds`, `--fast`, `--device`, and `--experiments`, but not separate `--difficulty`, `--num-users`, or `--simulation-days` flags. For the exact grouped paper-scale runs, use the helper below from the repository root. Define this shell helper once: ```bash run_group() { local group="$1" local seed="$2" local out_json="$3" PYTHONPATH=. python3 - "$group" "$seed" "$out_json" <<'PY' import json import math import sys import time from pathlib import Path from src.core.config_loader import load_config from experiments.run_all import ( build_gate_pool_from_frames, gate_volume_is_sufficient, generate_single_difficulty, offset_gate_namespace, prepare_gate_subset, run_motif_validity_check, set_global_determinism, ) def normalize(value): if isinstance(value, dict): return {k: normalize(v) for k, v in value.items()} if isinstance(value, (list, tuple)): return [normalize(v) for v in value] if hasattr(value, "item"): try: value = value.item() except Exception: pass if isinstance(value, float) and not math.isfinite(value): return None return value group = sys.argv[1] seed = int(sys.argv[2]) out_json = Path(sys.argv[3]) if group == "oracle_calib": benchmark_mode = "temporal_twins_oracle_calib" difficulty = "easy" hard_abort = True else: benchmark_mode = "temporal_twins" difficulty = group hard_abort = False cfg = load_config("config/default.yaml") cfg = cfg.model_copy( update={ "num_users": 350, "simulation_days": 45, "benchmark_mode": benchmark_mode, "random_seed": seed, } ) set_global_determinism(seed) pool = generate_single_difficulty( cfg, difficulty=difficulty, seed=seed, benchmark_mode=benchmark_mode, ) gate = prepare_gate_subset(pool, seed=seed, fast_mode=False) pack_count = 1 while (not gate_volume_is_sufficient(gate["volume"], False)) and pack_count <= 6: extra_seed = seed + pack_count * 10007 extra_pack = generate_single_difficulty( cfg, difficulty=difficulty, seed=extra_seed, benchmark_mode=benchmark_mode, ) extra_pack = offset_gate_namespace(extra_pack, pack_count) pool = build_gate_pool_from_frames([pool, extra_pack]) gate = prepare_gate_subset(pool, seed=seed, fast_mode=False) pack_count += 1 gate["source_pool_events"] = int(len(pool)) gate["source_pool_pairs"] = int(pool.loc[pool["twin_pair_id"] >= 0, "twin_pair_id"].nunique()) if "twin_pair_id" in pool.columns else 0 gate["source_pool_packs"] = int(pack_count) start = time.time() gate_pass, report = run_motif_validity_check( df=pool, config=cfg, seed=seed, device="cpu", num_epochs=3, node_epochs=150, n_checkpoints=8, hard_abort=hard_abort, benchmark_mode=benchmark_mode, fast_mode=False, force_temporal_models=True, prebuilt_gate=gate, ) elapsed = time.time() - start result = { "benchmark_group": group, "benchmark_mode": benchmark_mode, "seed": seed, "primary_metric_label": report["audit_metric_label"], "secondary_metric_label": report["raw_metric_label"], "gate_pass": bool(gate_pass), "run_wall_time_sec": float(elapsed), **report, } out_json.parent.mkdir(parents=True, exist_ok=True) out_json.write_text(json.dumps(normalize(result), indent=2) + "\n") print(f"Wrote {out_json}") PY } ``` ### Reproduce `oracle_calib` ```bash run_group oracle_calib 0 results/paper_suite_repro/jobs/oracle_calib_0.json ``` ### Reproduce `easy` ```bash run_group easy 0 results/paper_suite_repro/jobs/easy_0.json ``` ### Reproduce `medium` ```bash run_group medium 0 results/paper_suite_repro/jobs/medium_0.json ``` ### Reproduce `hard` ```bash run_group hard 0 results/paper_suite_repro/jobs/hard_0.json ``` ## Reproduce the Full Paper Suite ```bash mkdir -p results/paper_suite_repro/jobs for group in oracle_calib easy medium hard; do for seed in 0 1 2 3 4; do run_group "$group" "$seed" "results/paper_suite_repro/jobs/${group}_${seed}.json" done done ``` The frozen reference outputs for the final deterministic suite are already included in `results/`: - `paper_suite_summary.csv` - `paper_suite_summary.md` - `paper_suite_runtime.csv` - `paper_suite_meta.json` - `paper_suite_runs.csv` - `PAPER_GATE_INTERPRETATION.md` ## Expected Headline Results | Benchmark | XGBoost ROC-AUC | StaticGNN ROC-AUC | SeqGRU ROC-AUC | SeqGRU Shuffle Delta | | --- | ---: | ---: | ---: | ---: | | `oracle_calib` | `0.5000` | `0.5222` | `1.0000` | `-0.5032` | | `easy` | `0.5000` | `0.4946` | `1.0000` | `-0.5003` | | `medium` | `0.5000` | `0.4922` | `0.8391` | `-0.3337` | | `hard` | `0.5000` | `0.5026` | `0.6876` | `-0.1883` | ## Determinism CPU deterministic runtime is enabled. The same seed should reproduce identical matched-prefix data and metrics. Deterministic torch settings can slow runtime, especially for the non-fast paper-scale suite. ## Data Note This code repository contains source code, metadata, documentation, and lightweight result summaries only. The generated synthetic dataset and full release artifacts are hosted separately at the dataset repository: - [https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins](https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins) ## Privacy Note - Synthetic data only - No real UPI transactions - No real users - No real bank accounts - No personal financial records ## License - Code: `Apache-2.0` - Dataset and generated benchmark artifacts: `CC-BY-4.0` ## Citation Anonymous NeurIPS 2026 submission; final citation to be added after review.