File size: 2,231 Bytes
a3682cf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2c3d57f
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
# Determinism in Temporal Twins

## Summary

Temporal Twins uses deterministic seeding and deterministic runtime settings so that the generated matched-prefix datasets, audit counts, and benchmark metrics are reproducible across reruns of the same configuration and seed.

## Seeding

The benchmark runtime sets deterministic seeds for:

- Python `random`
- NumPy
- PyTorch
- CUDA via `torch.cuda.manual_seed_all(...)` when CUDA is available

Difficulty- and benchmark-mode-derived seeds use a stable hash function rather than Python's process-randomized `hash()`.

## Deterministic Torch Configuration

When supported by the runtime, the benchmark enables:

- `torch.backends.cudnn.deterministic = True`
- `torch.backends.cudnn.benchmark = False`
- `torch.use_deterministic_algorithms(True)`

The runtime also disables opportunistic nondeterministic math paths where practical and constrains CPU threading for repeatability.

## CPU Deterministic Mode

The deterministic paper suite was run in a CPU-oriented deterministic configuration. This favors repeatability over throughput and is the recommended mode for artifact evaluation and paper reproduction.

## Expected Reproducibility Behavior

- The generated matched-prefix dataset should be identical for the same benchmark mode, difficulty, and seed.
- Audit counts and shortcut AUCs should be identical for the same configuration and seed.
- Model metrics are expected to be identical or numerically indistinguishable when run under the same deterministic environment.

## Runtime Tradeoff

Deterministic execution is slower than unconstrained training because it restricts thread-level and backend-level nondeterministic optimizations. This is expected, especially for larger non-fast calibration runs and the full paper suite.

## Hosted Resources

- Dataset URL: `https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins`
- Code repository URL: `https://huggingface.co/temporal-twins-benchmark/temporal-twins-code`
- Croissant metadata URL: `https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins/raw/main/metadata/temporal_twins_croissant.json`
- Paper or preprint: Not available during double-blind review; to be added after publication.