temporal-twins-benchmark
/

temporal-twins-code

temporal-graph-learning

fraud-detection

causal-evaluation

matched-controls

Model card Files Files and versions

temporal-twins-code / docs /DETERMINISM.md

temporal-twins-anon's picture

temporal-twins-anon

Replace hosted code and metadata placeholders

2c3d57f verified 5 days ago

|

history blame contribute delete

2.23 kB

	# Determinism in Temporal Twins

	## Summary

	Temporal Twins uses deterministic seeding and deterministic runtime settings so that the generated matched-prefix datasets, audit counts, and benchmark metrics are reproducible across reruns of the same configuration and seed.

	## Seeding

	The benchmark runtime sets deterministic seeds for:

	- Python `random`
	- NumPy
	- PyTorch
	- CUDA via `torch.cuda.manual_seed_all(...)` when CUDA is available

	Difficulty- and benchmark-mode-derived seeds use a stable hash function rather than Python's process-randomized `hash()`.

	## Deterministic Torch Configuration

	When supported by the runtime, the benchmark enables:

	- `torch.backends.cudnn.deterministic = True`
	- `torch.backends.cudnn.benchmark = False`
	- `torch.use_deterministic_algorithms(True)`

	The runtime also disables opportunistic nondeterministic math paths where practical and constrains CPU threading for repeatability.

	## CPU Deterministic Mode

	The deterministic paper suite was run in a CPU-oriented deterministic configuration. This favors repeatability over throughput and is the recommended mode for artifact evaluation and paper reproduction.

	## Expected Reproducibility Behavior

	- The generated matched-prefix dataset should be identical for the same benchmark mode, difficulty, and seed.
	- Audit counts and shortcut AUCs should be identical for the same configuration and seed.
	- Model metrics are expected to be identical or numerically indistinguishable when run under the same deterministic environment.

	## Runtime Tradeoff

	Deterministic execution is slower than unconstrained training because it restricts thread-level and backend-level nondeterministic optimizations. This is expected, especially for larger non-fast calibration runs and the full paper suite.

	## Hosted Resources

	- Dataset URL: `https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins`
	- Code repository URL: `https://huggingface.co/temporal-twins-benchmark/temporal-twins-code`
	- Croissant metadata URL: `https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins/raw/main/metadata/temporal_twins_croissant.json`
	- Paper or preprint: Not available during double-blind review; to be added after publication.