| --- |
| license: apache-2.0 |
| tags: |
| - temporal-graph-learning |
| - fraud-detection |
| - synthetic-data |
| - benchmark |
| - upi |
| - causal-evaluation |
| - matched-controls |
| - neurips |
| --- |
| |
| # Temporal Twins: A Matched-Control Benchmark for Temporal Fraud Detection |
|
|
| Synthetic UPI-style temporal transaction benchmark where fraud and benign trajectories are matched on static and prefix-level summaries but differ in delayed event-order structure. |
|
|
| ## Links |
|
|
| - Dataset repository: [https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins](https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins) |
| - Code repository: [https://huggingface.co/temporal-twins-benchmark/temporal-twins-code](https://huggingface.co/temporal-twins-benchmark/temporal-twins-code) |
| - Croissant metadata URL: [https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins/raw/main/metadata/temporal_twins_croissant.json](https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins/raw/main/metadata/temporal_twins_croissant.json) |
| - Paper or preprint: Not available during double-blind review; to be added after publication. |
|
|
| ## Installation |
|
|
| Recommended Python: `3.11+` |
|
|
| ```bash |
| pip install -r requirements.txt |
| ``` |
|
|
| If you prefer Conda: |
|
|
| ```bash |
| conda env create -f environment.yml |
| conda activate temporal-twins |
| ``` |
|
|
| ## Repository Structure |
|
|
| - `src/`: synthetic user, transaction, risk, fraud, graph, and temporal benchmark generation code |
| - `models/`: SeqGRU, static baselines, audit/probe models, and temporal GNN wrappers |
| - `experiments/`: deterministic benchmark runner and matched-prefix evaluation utilities |
| - `config/`: base YAML configs used by the experiment runner |
| - `configs/`: release-facing config snapshots for calibration and paper-suite reproduction |
| - `docs/`: determinism and supporting documentation |
| - `metadata/`: MLCommons Croissant metadata and validation notes |
| - `results/`: lightweight frozen paper-suite summaries and interpretation notes |
|
|
| ## Quick Smoke Test |
|
|
| ```bash |
| PYTHONPATH=. python3 experiments/run_all.py \ |
| --fast \ |
| --seed 0 \ |
| --benchmark-mode temporal_twins_oracle_calib \ |
| --experiments audit \ |
| --device cpu |
| ``` |
|
|
| ## Exact Paper-Scale Reproduction |
|
|
| The checked-in CLI exposes `--benchmark-mode`, `--seed`, `--seeds`, `--fast`, `--device`, and `--experiments`, but not separate `--difficulty`, `--num-users`, or `--simulation-days` flags. For the exact grouped paper-scale runs, use the helper below from the repository root. |
|
|
| Define this shell helper once: |
|
|
| ```bash |
| run_group() { |
| local group="$1" |
| local seed="$2" |
| local out_json="$3" |
| |
| PYTHONPATH=. python3 - "$group" "$seed" "$out_json" <<'PY' |
| import json |
| import math |
| import sys |
| import time |
| from pathlib import Path |
| |
| from src.core.config_loader import load_config |
| from experiments.run_all import ( |
| build_gate_pool_from_frames, |
| gate_volume_is_sufficient, |
| generate_single_difficulty, |
| offset_gate_namespace, |
| prepare_gate_subset, |
| run_motif_validity_check, |
| set_global_determinism, |
| ) |
| |
| |
| def normalize(value): |
| if isinstance(value, dict): |
| return {k: normalize(v) for k, v in value.items()} |
| if isinstance(value, (list, tuple)): |
| return [normalize(v) for v in value] |
| if hasattr(value, "item"): |
| try: |
| value = value.item() |
| except Exception: |
| pass |
| if isinstance(value, float) and not math.isfinite(value): |
| return None |
| return value |
| |
| |
| group = sys.argv[1] |
| seed = int(sys.argv[2]) |
| out_json = Path(sys.argv[3]) |
| |
| if group == "oracle_calib": |
| benchmark_mode = "temporal_twins_oracle_calib" |
| difficulty = "easy" |
| hard_abort = True |
| else: |
| benchmark_mode = "temporal_twins" |
| difficulty = group |
| hard_abort = False |
| |
| cfg = load_config("config/default.yaml") |
| cfg = cfg.model_copy( |
| update={ |
| "num_users": 350, |
| "simulation_days": 45, |
| "benchmark_mode": benchmark_mode, |
| "random_seed": seed, |
| } |
| ) |
| |
| set_global_determinism(seed) |
| pool = generate_single_difficulty( |
| cfg, |
| difficulty=difficulty, |
| seed=seed, |
| benchmark_mode=benchmark_mode, |
| ) |
| gate = prepare_gate_subset(pool, seed=seed, fast_mode=False) |
| pack_count = 1 |
| |
| while (not gate_volume_is_sufficient(gate["volume"], False)) and pack_count <= 6: |
| extra_seed = seed + pack_count * 10007 |
| extra_pack = generate_single_difficulty( |
| cfg, |
| difficulty=difficulty, |
| seed=extra_seed, |
| benchmark_mode=benchmark_mode, |
| ) |
| extra_pack = offset_gate_namespace(extra_pack, pack_count) |
| pool = build_gate_pool_from_frames([pool, extra_pack]) |
| gate = prepare_gate_subset(pool, seed=seed, fast_mode=False) |
| pack_count += 1 |
| |
| gate["source_pool_events"] = int(len(pool)) |
| gate["source_pool_pairs"] = int(pool.loc[pool["twin_pair_id"] >= 0, "twin_pair_id"].nunique()) if "twin_pair_id" in pool.columns else 0 |
| gate["source_pool_packs"] = int(pack_count) |
| |
| start = time.time() |
| gate_pass, report = run_motif_validity_check( |
| df=pool, |
| config=cfg, |
| seed=seed, |
| device="cpu", |
| num_epochs=3, |
| node_epochs=150, |
| n_checkpoints=8, |
| hard_abort=hard_abort, |
| benchmark_mode=benchmark_mode, |
| fast_mode=False, |
| force_temporal_models=True, |
| prebuilt_gate=gate, |
| ) |
| elapsed = time.time() - start |
| |
| result = { |
| "benchmark_group": group, |
| "benchmark_mode": benchmark_mode, |
| "seed": seed, |
| "primary_metric_label": report["audit_metric_label"], |
| "secondary_metric_label": report["raw_metric_label"], |
| "gate_pass": bool(gate_pass), |
| "run_wall_time_sec": float(elapsed), |
| **report, |
| } |
| |
| out_json.parent.mkdir(parents=True, exist_ok=True) |
| out_json.write_text(json.dumps(normalize(result), indent=2) + "\n") |
| print(f"Wrote {out_json}") |
| PY |
| } |
| ``` |
|
|
| ### Reproduce `oracle_calib` |
| |
| ```bash |
| run_group oracle_calib 0 results/paper_suite_repro/jobs/oracle_calib_0.json |
| ``` |
| |
| ### Reproduce `easy` |
| |
| ```bash |
| run_group easy 0 results/paper_suite_repro/jobs/easy_0.json |
| ``` |
| |
| ### Reproduce `medium` |
| |
| ```bash |
| run_group medium 0 results/paper_suite_repro/jobs/medium_0.json |
| ``` |
| |
| ### Reproduce `hard` |
| |
| ```bash |
| run_group hard 0 results/paper_suite_repro/jobs/hard_0.json |
| ``` |
| |
| ## Reproduce the Full Paper Suite |
| |
| ```bash |
| mkdir -p results/paper_suite_repro/jobs |
| |
| for group in oracle_calib easy medium hard; do |
| for seed in 0 1 2 3 4; do |
| run_group "$group" "$seed" "results/paper_suite_repro/jobs/${group}_${seed}.json" |
| done |
| done |
| ``` |
| |
| The frozen reference outputs for the final deterministic suite are already included in `results/`: |
|
|
| - `paper_suite_summary.csv` |
| - `paper_suite_summary.md` |
| - `paper_suite_runtime.csv` |
| - `paper_suite_meta.json` |
| - `paper_suite_runs.csv` |
| - `PAPER_GATE_INTERPRETATION.md` |
|
|
| ## Expected Headline Results |
|
|
| | Benchmark | XGBoost ROC-AUC | StaticGNN ROC-AUC | SeqGRU ROC-AUC | SeqGRU Shuffle Delta | |
| | --- | ---: | ---: | ---: | ---: | |
| | `oracle_calib` | `0.5000` | `0.5222` | `1.0000` | `-0.5032` | |
| | `easy` | `0.5000` | `0.4946` | `1.0000` | `-0.5003` | |
| | `medium` | `0.5000` | `0.4922` | `0.8391` | `-0.3337` | |
| | `hard` | `0.5000` | `0.5026` | `0.6876` | `-0.1883` | |
|
|
| ## Determinism |
|
|
| CPU deterministic runtime is enabled. The same seed should reproduce identical matched-prefix data and metrics. Deterministic torch settings can slow runtime, especially for the non-fast paper-scale suite. |
|
|
| ## Data Note |
|
|
| This code repository contains source code, metadata, documentation, and lightweight result summaries only. The generated synthetic dataset and full release artifacts are hosted separately at the dataset repository: |
|
|
| - [https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins](https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins) |
|
|
| ## Privacy Note |
|
|
| - Synthetic data only |
| - No real UPI transactions |
| - No real users |
| - No real bank accounts |
| - No personal financial records |
|
|
| ## License |
|
|
| - Code: `Apache-2.0` |
| - Dataset and generated benchmark artifacts: `CC-BY-4.0` |
|
|
| ## Citation |
|
|
| Anonymous NeurIPS 2026 submission; final citation to be added after review. |
|
|