temporal-twins-code / docs /DETERMINISM.md
temporal-twins-anon's picture
Replace hosted code and metadata placeholders
2c3d57f verified

Determinism in Temporal Twins

Summary

Temporal Twins uses deterministic seeding and deterministic runtime settings so that the generated matched-prefix datasets, audit counts, and benchmark metrics are reproducible across reruns of the same configuration and seed.

Seeding

The benchmark runtime sets deterministic seeds for:

  • Python random
  • NumPy
  • PyTorch
  • CUDA via torch.cuda.manual_seed_all(...) when CUDA is available

Difficulty- and benchmark-mode-derived seeds use a stable hash function rather than Python's process-randomized hash().

Deterministic Torch Configuration

When supported by the runtime, the benchmark enables:

  • torch.backends.cudnn.deterministic = True
  • torch.backends.cudnn.benchmark = False
  • torch.use_deterministic_algorithms(True)

The runtime also disables opportunistic nondeterministic math paths where practical and constrains CPU threading for repeatability.

CPU Deterministic Mode

The deterministic paper suite was run in a CPU-oriented deterministic configuration. This favors repeatability over throughput and is the recommended mode for artifact evaluation and paper reproduction.

Expected Reproducibility Behavior

  • The generated matched-prefix dataset should be identical for the same benchmark mode, difficulty, and seed.
  • Audit counts and shortcut AUCs should be identical for the same configuration and seed.
  • Model metrics are expected to be identical or numerically indistinguishable when run under the same deterministic environment.

Runtime Tradeoff

Deterministic execution is slower than unconstrained training because it restricts thread-level and backend-level nondeterministic optimizations. This is expected, especially for larger non-fast calibration runs and the full paper suite.

Hosted Resources

  • Dataset URL: https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins
  • Code repository URL: https://huggingface.co/temporal-twins-benchmark/temporal-twins-code
  • Croissant metadata URL: https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins/raw/main/metadata/temporal_twins_croissant.json
  • Paper or preprint: Not available during double-blind review; to be added after publication.