temporal-twins-code / README.md
temporal-twins-anon's picture
Replace hosted code and metadata placeholders
2c3d57f verified
---
license: apache-2.0
tags:
- temporal-graph-learning
- fraud-detection
- synthetic-data
- benchmark
- upi
- causal-evaluation
- matched-controls
- neurips
---
# Temporal Twins: A Matched-Control Benchmark for Temporal Fraud Detection
Synthetic UPI-style temporal transaction benchmark where fraud and benign trajectories are matched on static and prefix-level summaries but differ in delayed event-order structure.
## Links
- Dataset repository: [https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins](https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins)
- Code repository: [https://huggingface.co/temporal-twins-benchmark/temporal-twins-code](https://huggingface.co/temporal-twins-benchmark/temporal-twins-code)
- Croissant metadata URL: [https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins/raw/main/metadata/temporal_twins_croissant.json](https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins/raw/main/metadata/temporal_twins_croissant.json)
- Paper or preprint: Not available during double-blind review; to be added after publication.
## Installation
Recommended Python: `3.11+`
```bash
pip install -r requirements.txt
```
If you prefer Conda:
```bash
conda env create -f environment.yml
conda activate temporal-twins
```
## Repository Structure
- `src/`: synthetic user, transaction, risk, fraud, graph, and temporal benchmark generation code
- `models/`: SeqGRU, static baselines, audit/probe models, and temporal GNN wrappers
- `experiments/`: deterministic benchmark runner and matched-prefix evaluation utilities
- `config/`: base YAML configs used by the experiment runner
- `configs/`: release-facing config snapshots for calibration and paper-suite reproduction
- `docs/`: determinism and supporting documentation
- `metadata/`: MLCommons Croissant metadata and validation notes
- `results/`: lightweight frozen paper-suite summaries and interpretation notes
## Quick Smoke Test
```bash
PYTHONPATH=. python3 experiments/run_all.py \
--fast \
--seed 0 \
--benchmark-mode temporal_twins_oracle_calib \
--experiments audit \
--device cpu
```
## Exact Paper-Scale Reproduction
The checked-in CLI exposes `--benchmark-mode`, `--seed`, `--seeds`, `--fast`, `--device`, and `--experiments`, but not separate `--difficulty`, `--num-users`, or `--simulation-days` flags. For the exact grouped paper-scale runs, use the helper below from the repository root.
Define this shell helper once:
```bash
run_group() {
local group="$1"
local seed="$2"
local out_json="$3"
PYTHONPATH=. python3 - "$group" "$seed" "$out_json" <<'PY'
import json
import math
import sys
import time
from pathlib import Path
from src.core.config_loader import load_config
from experiments.run_all import (
build_gate_pool_from_frames,
gate_volume_is_sufficient,
generate_single_difficulty,
offset_gate_namespace,
prepare_gate_subset,
run_motif_validity_check,
set_global_determinism,
)
def normalize(value):
if isinstance(value, dict):
return {k: normalize(v) for k, v in value.items()}
if isinstance(value, (list, tuple)):
return [normalize(v) for v in value]
if hasattr(value, "item"):
try:
value = value.item()
except Exception:
pass
if isinstance(value, float) and not math.isfinite(value):
return None
return value
group = sys.argv[1]
seed = int(sys.argv[2])
out_json = Path(sys.argv[3])
if group == "oracle_calib":
benchmark_mode = "temporal_twins_oracle_calib"
difficulty = "easy"
hard_abort = True
else:
benchmark_mode = "temporal_twins"
difficulty = group
hard_abort = False
cfg = load_config("config/default.yaml")
cfg = cfg.model_copy(
update={
"num_users": 350,
"simulation_days": 45,
"benchmark_mode": benchmark_mode,
"random_seed": seed,
}
)
set_global_determinism(seed)
pool = generate_single_difficulty(
cfg,
difficulty=difficulty,
seed=seed,
benchmark_mode=benchmark_mode,
)
gate = prepare_gate_subset(pool, seed=seed, fast_mode=False)
pack_count = 1
while (not gate_volume_is_sufficient(gate["volume"], False)) and pack_count <= 6:
extra_seed = seed + pack_count * 10007
extra_pack = generate_single_difficulty(
cfg,
difficulty=difficulty,
seed=extra_seed,
benchmark_mode=benchmark_mode,
)
extra_pack = offset_gate_namespace(extra_pack, pack_count)
pool = build_gate_pool_from_frames([pool, extra_pack])
gate = prepare_gate_subset(pool, seed=seed, fast_mode=False)
pack_count += 1
gate["source_pool_events"] = int(len(pool))
gate["source_pool_pairs"] = int(pool.loc[pool["twin_pair_id"] >= 0, "twin_pair_id"].nunique()) if "twin_pair_id" in pool.columns else 0
gate["source_pool_packs"] = int(pack_count)
start = time.time()
gate_pass, report = run_motif_validity_check(
df=pool,
config=cfg,
seed=seed,
device="cpu",
num_epochs=3,
node_epochs=150,
n_checkpoints=8,
hard_abort=hard_abort,
benchmark_mode=benchmark_mode,
fast_mode=False,
force_temporal_models=True,
prebuilt_gate=gate,
)
elapsed = time.time() - start
result = {
"benchmark_group": group,
"benchmark_mode": benchmark_mode,
"seed": seed,
"primary_metric_label": report["audit_metric_label"],
"secondary_metric_label": report["raw_metric_label"],
"gate_pass": bool(gate_pass),
"run_wall_time_sec": float(elapsed),
**report,
}
out_json.parent.mkdir(parents=True, exist_ok=True)
out_json.write_text(json.dumps(normalize(result), indent=2) + "\n")
print(f"Wrote {out_json}")
PY
}
```
### Reproduce `oracle_calib`
```bash
run_group oracle_calib 0 results/paper_suite_repro/jobs/oracle_calib_0.json
```
### Reproduce `easy`
```bash
run_group easy 0 results/paper_suite_repro/jobs/easy_0.json
```
### Reproduce `medium`
```bash
run_group medium 0 results/paper_suite_repro/jobs/medium_0.json
```
### Reproduce `hard`
```bash
run_group hard 0 results/paper_suite_repro/jobs/hard_0.json
```
## Reproduce the Full Paper Suite
```bash
mkdir -p results/paper_suite_repro/jobs
for group in oracle_calib easy medium hard; do
for seed in 0 1 2 3 4; do
run_group "$group" "$seed" "results/paper_suite_repro/jobs/${group}_${seed}.json"
done
done
```
The frozen reference outputs for the final deterministic suite are already included in `results/`:
- `paper_suite_summary.csv`
- `paper_suite_summary.md`
- `paper_suite_runtime.csv`
- `paper_suite_meta.json`
- `paper_suite_runs.csv`
- `PAPER_GATE_INTERPRETATION.md`
## Expected Headline Results
| Benchmark | XGBoost ROC-AUC | StaticGNN ROC-AUC | SeqGRU ROC-AUC | SeqGRU Shuffle Delta |
| --- | ---: | ---: | ---: | ---: |
| `oracle_calib` | `0.5000` | `0.5222` | `1.0000` | `-0.5032` |
| `easy` | `0.5000` | `0.4946` | `1.0000` | `-0.5003` |
| `medium` | `0.5000` | `0.4922` | `0.8391` | `-0.3337` |
| `hard` | `0.5000` | `0.5026` | `0.6876` | `-0.1883` |
## Determinism
CPU deterministic runtime is enabled. The same seed should reproduce identical matched-prefix data and metrics. Deterministic torch settings can slow runtime, especially for the non-fast paper-scale suite.
## Data Note
This code repository contains source code, metadata, documentation, and lightweight result summaries only. The generated synthetic dataset and full release artifacts are hosted separately at the dataset repository:
- [https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins](https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins)
## Privacy Note
- Synthetic data only
- No real UPI transactions
- No real users
- No real bank accounts
- No personal financial records
## License
- Code: `Apache-2.0`
- Dataset and generated benchmark artifacts: `CC-BY-4.0`
## Citation
Anonymous NeurIPS 2026 submission; final citation to be added after review.