Replace hosted code and metadata placeholders

2c3d57f verified 5 days ago

7.95 kB

	---
	license: apache-2.0
	tags:
	- temporal-graph-learning
	- fraud-detection
	- synthetic-data
	- benchmark
	- upi
	- causal-evaluation
	- matched-controls
	- neurips
	---

	# Temporal Twins: A Matched-Control Benchmark for Temporal Fraud Detection

	Synthetic UPI-style temporal transaction benchmark where fraud and benign trajectories are matched on static and prefix-level summaries but differ in delayed event-order structure.

	## Links

	- Dataset repository: [https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins](https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins)
	- Code repository: [https://huggingface.co/temporal-twins-benchmark/temporal-twins-code](https://huggingface.co/temporal-twins-benchmark/temporal-twins-code)
	- Croissant metadata URL: [https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins/raw/main/metadata/temporal_twins_croissant.json](https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins/raw/main/metadata/temporal_twins_croissant.json)
	- Paper or preprint: Not available during double-blind review; to be added after publication.

	## Installation

	Recommended Python: `3.11+`

	```bash
	pip install -r requirements.txt
	```

	If you prefer Conda:

	```bash
	conda env create -f environment.yml
	conda activate temporal-twins
	```

	## Repository Structure

	- `src/`: synthetic user, transaction, risk, fraud, graph, and temporal benchmark generation code
	- `models/`: SeqGRU, static baselines, audit/probe models, and temporal GNN wrappers
	- `experiments/`: deterministic benchmark runner and matched-prefix evaluation utilities
	- `config/`: base YAML configs used by the experiment runner
	- `configs/`: release-facing config snapshots for calibration and paper-suite reproduction
	- `docs/`: determinism and supporting documentation
	- `metadata/`: MLCommons Croissant metadata and validation notes
	- `results/`: lightweight frozen paper-suite summaries and interpretation notes

	## Quick Smoke Test

	```bash
	PYTHONPATH=. python3 experiments/run_all.py \
	--fast \
	--seed 0 \
	--benchmark-mode temporal_twins_oracle_calib \
	--experiments audit \
	--device cpu
	```

	## Exact Paper-Scale Reproduction

	The checked-in CLI exposes `--benchmark-mode`, `--seed`, `--seeds`, `--fast`, `--device`, and `--experiments`, but not separate `--difficulty`, `--num-users`, or `--simulation-days` flags. For the exact grouped paper-scale runs, use the helper below from the repository root.

	Define this shell helper once:

	```bash
	run_group() {
	local group="$1"
	local seed="$2"
	local out_json="$3"

	PYTHONPATH=. python3 - "$group" "$seed" "$out_json" <<'PY'
	import json
	import math
	import sys
	import time
	from pathlib import Path

	from src.core.config_loader import load_config
	from experiments.run_all import (
	build_gate_pool_from_frames,
	gate_volume_is_sufficient,
	generate_single_difficulty,
	offset_gate_namespace,
	prepare_gate_subset,
	run_motif_validity_check,
	set_global_determinism,
	)


	def normalize(value):
	if isinstance(value, dict):
	return {k: normalize(v) for k, v in value.items()}
	if isinstance(value, (list, tuple)):
	return [normalize(v) for v in value]
	if hasattr(value, "item"):
	try:
	value = value.item()
	except Exception:
	pass
	if isinstance(value, float) and not math.isfinite(value):
	return None
	return value


	group = sys.argv[1]
	seed = int(sys.argv[2])
	out_json = Path(sys.argv[3])

	if group == "oracle_calib":
	benchmark_mode = "temporal_twins_oracle_calib"
	difficulty = "easy"
	hard_abort = True
	else:
	benchmark_mode = "temporal_twins"
	difficulty = group
	hard_abort = False

	cfg = load_config("config/default.yaml")
	cfg = cfg.model_copy(
	update={
	"num_users": 350,
	"simulation_days": 45,
	"benchmark_mode": benchmark_mode,
	"random_seed": seed,
	}
	)

	set_global_determinism(seed)
	pool = generate_single_difficulty(
	cfg,
	difficulty=difficulty,
	seed=seed,
	benchmark_mode=benchmark_mode,
	)
	gate = prepare_gate_subset(pool, seed=seed, fast_mode=False)
	pack_count = 1

	while (not gate_volume_is_sufficient(gate["volume"], False)) and pack_count <= 6:
	extra_seed = seed + pack_count * 10007
	extra_pack = generate_single_difficulty(
	cfg,
	difficulty=difficulty,
	seed=extra_seed,
	benchmark_mode=benchmark_mode,
	)
	extra_pack = offset_gate_namespace(extra_pack, pack_count)
	pool = build_gate_pool_from_frames([pool, extra_pack])
	gate = prepare_gate_subset(pool, seed=seed, fast_mode=False)
	pack_count += 1

	gate["source_pool_events"] = int(len(pool))
	gate["source_pool_pairs"] = int(pool.loc[pool["twin_pair_id"] >= 0, "twin_pair_id"].nunique()) if "twin_pair_id" in pool.columns else 0
	gate["source_pool_packs"] = int(pack_count)

	start = time.time()
	gate_pass, report = run_motif_validity_check(
	df=pool,
	config=cfg,
	seed=seed,
	device="cpu",
	num_epochs=3,
	node_epochs=150,
	n_checkpoints=8,
	hard_abort=hard_abort,
	benchmark_mode=benchmark_mode,
	fast_mode=False,
	force_temporal_models=True,
	prebuilt_gate=gate,
	)
	elapsed = time.time() - start

	result = {
	"benchmark_group": group,
	"benchmark_mode": benchmark_mode,
	"seed": seed,
	"primary_metric_label": report["audit_metric_label"],
	"secondary_metric_label": report["raw_metric_label"],
	"gate_pass": bool(gate_pass),
	"run_wall_time_sec": float(elapsed),
	**report,
	}

	out_json.parent.mkdir(parents=True, exist_ok=True)
	out_json.write_text(json.dumps(normalize(result), indent=2) + "\n")
	print(f"Wrote {out_json}")
	PY
	}
	```

	### Reproduce `oracle_calib`

	```bash
	run_group oracle_calib 0 results/paper_suite_repro/jobs/oracle_calib_0.json
	```

	### Reproduce `easy`

	```bash
	run_group easy 0 results/paper_suite_repro/jobs/easy_0.json
	```

	### Reproduce `medium`

	```bash
	run_group medium 0 results/paper_suite_repro/jobs/medium_0.json
	```

	### Reproduce `hard`

	```bash
	run_group hard 0 results/paper_suite_repro/jobs/hard_0.json
	```

	## Reproduce the Full Paper Suite

	```bash
	mkdir -p results/paper_suite_repro/jobs

	for group in oracle_calib easy medium hard; do
	for seed in 0 1 2 3 4; do
	run_group "$group" "$seed" "results/paper_suite_repro/jobs/${group}_${seed}.json"
	done
	done
	```

	The frozen reference outputs for the final deterministic suite are already included in `results/`:

	- `paper_suite_summary.csv`
	- `paper_suite_summary.md`
	- `paper_suite_runtime.csv`
	- `paper_suite_meta.json`
	- `paper_suite_runs.csv`
	- `PAPER_GATE_INTERPRETATION.md`

	## Expected Headline Results

	\| Benchmark \| XGBoost ROC-AUC \| StaticGNN ROC-AUC \| SeqGRU ROC-AUC \| SeqGRU Shuffle Delta \|
	\| --- \| ---: \| ---: \| ---: \| ---: \|
	\| `oracle_calib` \| `0.5000` \| `0.5222` \| `1.0000` \| `-0.5032` \|
	\| `easy` \| `0.5000` \| `0.4946` \| `1.0000` \| `-0.5003` \|
	\| `medium` \| `0.5000` \| `0.4922` \| `0.8391` \| `-0.3337` \|
	\| `hard` \| `0.5000` \| `0.5026` \| `0.6876` \| `-0.1883` \|

	## Determinism

	CPU deterministic runtime is enabled. The same seed should reproduce identical matched-prefix data and metrics. Deterministic torch settings can slow runtime, especially for the non-fast paper-scale suite.

	## Data Note

	This code repository contains source code, metadata, documentation, and lightweight result summaries only. The generated synthetic dataset and full release artifacts are hosted separately at the dataset repository:

	- [https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins](https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins)

	## Privacy Note

	- Synthetic data only
	- No real UPI transactions
	- No real users
	- No real bank accounts
	- No personal financial records

	## License

	- Code: `Apache-2.0`
	- Dataset and generated benchmark artifacts: `CC-BY-4.0`

	## Citation

	Anonymous NeurIPS 2026 submission; final citation to be added after review.