Zekai Wang commited on
Commit
c641413
·
1 Parent(s): cbf192e

Release ORCA TTT-Probes (17 configurations across 3 LLMs)

Browse files

17 trained Test-Time Training probes for the paper Online Reasoning
Calibration: Test-Time Training Enables Generalizable Conformal LLM
Reasoning (arXiv:2604.01170).

Each probe directory contains: probe.pt (state dict), config.json
(training hyperparameters), lambdas.json (LTT thresholds), metrics.json
(savings/error per delta), and ood_*.json (per-OOD-benchmark metrics
for Qwen2.5-32B variants).

Coverage:
- Qwen2.5-32B supervised: no_kq + qk_dh{32,64,128,256,512} + 5 architecture variants
- Qwen2.5-32B consistent: no_kq, qk_dh128
- QwQ-32B supervised: no_kq, qk_dh128
- Llama-3.3-70B supervised: no_kq, qk_dh128

This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. README.md +69 -3
  2. llama-3.3-70b/supervised/no_kq/config.json +43 -0
  3. llama-3.3-70b/supervised/no_kq/lambdas.json +13 -0
  4. llama-3.3-70b/supervised/no_kq/metrics.json +70 -0
  5. llama-3.3-70b/supervised/no_kq/ood_aime24.json +68 -0
  6. llama-3.3-70b/supervised/no_kq/ood_aime25.json +68 -0
  7. llama-3.3-70b/supervised/no_kq/ood_aime26.json +68 -0
  8. llama-3.3-70b/supervised/no_kq/ood_gpqa_diamond.json +68 -0
  9. llama-3.3-70b/supervised/no_kq/ood_math500.json +68 -0
  10. llama-3.3-70b/supervised/no_kq/probe.pt +3 -0
  11. llama-3.3-70b/supervised/qk_dh128/config.json +43 -0
  12. llama-3.3-70b/supervised/qk_dh128/lambdas.json +13 -0
  13. llama-3.3-70b/supervised/qk_dh128/metrics.json +70 -0
  14. llama-3.3-70b/supervised/qk_dh128/ood_aime24.json +68 -0
  15. llama-3.3-70b/supervised/qk_dh128/ood_aime25.json +68 -0
  16. llama-3.3-70b/supervised/qk_dh128/ood_aime26.json +68 -0
  17. llama-3.3-70b/supervised/qk_dh128/ood_gpqa_diamond.json +68 -0
  18. llama-3.3-70b/supervised/qk_dh128/ood_math500.json +68 -0
  19. llama-3.3-70b/supervised/qk_dh128/probe.pt +3 -0
  20. qwen2.5-32b/consistent/no_kq/config.json +42 -0
  21. qwen2.5-32b/consistent/no_kq/lambdas.json +13 -0
  22. qwen2.5-32b/consistent/no_kq/metrics.json +70 -0
  23. qwen2.5-32b/consistent/no_kq/ood_aime24.json +68 -0
  24. qwen2.5-32b/consistent/no_kq/ood_aime25.json +68 -0
  25. qwen2.5-32b/consistent/no_kq/ood_aime26.json +68 -0
  26. qwen2.5-32b/consistent/no_kq/ood_gpqa_diamond.json +68 -0
  27. qwen2.5-32b/consistent/no_kq/ood_math500.json +68 -0
  28. qwen2.5-32b/consistent/no_kq/probe.pt +3 -0
  29. qwen2.5-32b/consistent/qk_dh128/config.json +42 -0
  30. qwen2.5-32b/consistent/qk_dh128/lambdas.json +13 -0
  31. qwen2.5-32b/consistent/qk_dh128/metrics.json +70 -0
  32. qwen2.5-32b/consistent/qk_dh128/ood_aime24.json +68 -0
  33. qwen2.5-32b/consistent/qk_dh128/ood_aime25.json +68 -0
  34. qwen2.5-32b/consistent/qk_dh128/ood_aime26.json +68 -0
  35. qwen2.5-32b/consistent/qk_dh128/ood_gpqa_diamond.json +68 -0
  36. qwen2.5-32b/consistent/qk_dh128/ood_math500.json +68 -0
  37. qwen2.5-32b/consistent/qk_dh128/probe.pt +3 -0
  38. qwen2.5-32b/supervised/no_kq/config.json +42 -0
  39. qwen2.5-32b/supervised/no_kq/lambdas.json +13 -0
  40. qwen2.5-32b/supervised/no_kq/metrics.json +70 -0
  41. qwen2.5-32b/supervised/no_kq/ood_aime24.json +68 -0
  42. qwen2.5-32b/supervised/no_kq/ood_aime25.json +68 -0
  43. qwen2.5-32b/supervised/no_kq/ood_aime26.json +68 -0
  44. qwen2.5-32b/supervised/no_kq/ood_gpqa_diamond.json +68 -0
  45. qwen2.5-32b/supervised/no_kq/ood_math500.json +68 -0
  46. qwen2.5-32b/supervised/no_kq/probe.pt +3 -0
  47. qwen2.5-32b/supervised/qk_dh128/config.json +42 -0
  48. qwen2.5-32b/supervised/qk_dh128/lambdas.json +13 -0
  49. qwen2.5-32b/supervised/qk_dh128/metrics.json +70 -0
  50. qwen2.5-32b/supervised/qk_dh128/ood_aime24.json +68 -0
README.md CHANGED
@@ -1,3 +1,69 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ library_name: pytorch
4
+ tags:
5
+ - test-time-training
6
+ - conformal-prediction
7
+ - reasoning
8
+ - early-stopping
9
+ - llm
10
+ datasets:
11
+ - wzekai99/ORCA
12
+ ---
13
+
14
+ # ORCA TTT-Probes
15
+
16
+ Trained Test-Time Training probes for *Online Reasoning Calibration: Test-Time Training Enables Generalizable Conformal LLM Reasoning* ([arXiv:2604.01170](https://arxiv.org/abs/2604.01170)).
17
+
18
+ ## Layout (17 probes)
19
+
20
+ ```
21
+ qwen2.5-32b/supervised/{no_kq, qk_dh128,
22
+ qk_dh32, qk_dh64, qk_dh256, qk_dh512,
23
+ qk_dh128_ln, qk_dh128_ln_res, qk_dh128_share_kq,
24
+ qk_dh128_eta_learn, qk_dh128_mlp}/
25
+ qwen2.5-32b/consistent/{no_kq, qk_dh128}/
26
+ qwq-32b/supervised/{no_kq, qk_dh128}/
27
+ llama-3.3-70b/supervised/{no_kq, qk_dh128}/
28
+ ```
29
+
30
+ Per probe directory:
31
+
32
+ | File | Contents |
33
+ |-------------------|----------------------------------------------------------------|
34
+ | `probe.pt` | State dict: W0, b0, log_eta; QK variants also include theta_K, theta_Q |
35
+ | `config.json` | Training hyperparameters (d_hidden, base_lr, epochs, ...) |
36
+ | `lambdas.json` | LTT thresholds, keyed by delta |
37
+ | `metrics.json` | Step-level savings and error rate per delta |
38
+ | `ood_*.json` | Per-OOD-benchmark metrics (Qwen2.5-32B probes only) |
39
+
40
+ ## Use
41
+
42
+ Probes are loaded by the `TTTProbe` class in https://github.com/wzekai99/ORCA. Quick example:
43
+
44
+ ```bash
45
+ hf download wzekai99/ORCA --local-dir probes
46
+ hf download wzekai99/ORCA --repo-type dataset --local-dir data
47
+ python code/test.py \
48
+ --method ttt --no_kq \
49
+ --dataset_path data/qwen2.5-32b/s1k.pkl \
50
+ data/qwen2.5-32b/openr1_2k.pkl \
51
+ data/qwen2.5-32b/deepmath_2k.pkl \
52
+ --probe_path probes/qwen2.5-32b/supervised/no_kq/probe.pt \
53
+ --label_mode supervised --delta 0.1 --epsilon 0.05
54
+ ```
55
+
56
+ ## License
57
+
58
+ MIT.
59
+
60
+ ## Citation
61
+
62
+ ```bibtex
63
+ @article{zhou2026online,
64
+ title={Online Reasoning Calibration: Test-Time Training Enables Generalizable Conformal LLM Reasoning},
65
+ author={Zhou, Cai and Wang, Zekai and Wu, Menghua and Zhu, Qianyu Julie and Shi, Flora C and Wang, Chenyu and Wilson, Ashia and Jaakkola, Tommi and Bates, Stephen},
66
+ journal={arXiv preprint arXiv:2604.01170},
67
+ year={2026}
68
+ }
69
+ ```
llama-3.3-70b/supervised/no_kq/config.json ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "config": "configs/llama70b_5k.yaml",
3
+ "method": "ttt",
4
+ "dataset_path": [
5
+ "data_prepare/output/llama70b/s1k/dataset.pkl",
6
+ "data_prepare/output/llama70b/openr1_2k/dataset.pkl",
7
+ "data_prepare/output/llama70b/deepmath_2k/dataset.pkl"
8
+ ],
9
+ "ood_paths": [
10
+ "data_prepare/output/llama70b/aime24/dataset.pkl",
11
+ "data_prepare/output/llama70b/aime25/dataset.pkl",
12
+ "data_prepare/output/llama70b/aime26/dataset.pkl",
13
+ "data_prepare/output/llama70b/math500/dataset.pkl",
14
+ "data_prepare/output/llama70b/gpqa_diamond/dataset.pkl"
15
+ ],
16
+ "output_dir": "results/llama70b_5k",
17
+ "label_mode": "supervised",
18
+ "batch_size": 10,
19
+ "seed": 42,
20
+ "smooth_window": 10,
21
+ "run_name": "ttt__no_kq__lr0.01__ep40",
22
+ "d_hidden": 64,
23
+ "use_ln": false,
24
+ "use_residual": false,
25
+ "learnable_eta": false,
26
+ "base_lr": 0.01,
27
+ "share_kq": false,
28
+ "use_mlp": false,
29
+ "use_pca": false,
30
+ "pca_dim": 256,
31
+ "epochs": 20,
32
+ "outer_lr": 0.001,
33
+ "no_meta_train": false,
34
+ "no_online_update": false,
35
+ "no_kq": true,
36
+ "grad_clip": 1.0,
37
+ "force_retrain": true,
38
+ "save_every": 10,
39
+ "d_phi": 8192,
40
+ "timestamp": "2026-03-30T01:32:49.432549",
41
+ "release_target": "llama-3.3-70b/supervised/no_kq",
42
+ "release_probe_source": "llama70b_5k/supervised/ttt__no_kq__lr0.01__ep40/checkpoints/probe_ep20.pt"
43
+ }
llama-3.3-70b/supervised/no_kq/lambdas.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "0.01": 0.9382,
3
+ "0.025": 0.9159,
4
+ "0.05": 0.8886000000000001,
5
+ "0.1": 0.8489,
6
+ "0.15": 0.8142,
7
+ "0.2": 0.7734,
8
+ "0.25": 0.7363,
9
+ "0.3": 0.7017,
10
+ "0.35": 0.6558999999999999,
11
+ "0.4": 0.5794,
12
+ "0.5": 9.999999999998899e-05
13
+ }
llama-3.3-70b/supervised/no_kq/metrics.json ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "eps_results": {
3
+ "0.01": {
4
+ "lambda": 0.9382,
5
+ "error_rate": 0.0086,
6
+ "savings": 0.052,
7
+ "accuracy": 0.9914
8
+ },
9
+ "0.025": {
10
+ "lambda": 0.9159,
11
+ "error_rate": 0.0235,
12
+ "savings": 0.1457,
13
+ "accuracy": 0.9765
14
+ },
15
+ "0.05": {
16
+ "lambda": 0.8886000000000001,
17
+ "error_rate": 0.046,
18
+ "savings": 0.2702,
19
+ "accuracy": 0.954
20
+ },
21
+ "0.1": {
22
+ "lambda": 0.8489,
23
+ "error_rate": 0.0898,
24
+ "savings": 0.4238,
25
+ "accuracy": 0.9102
26
+ },
27
+ "0.15": {
28
+ "lambda": 0.8142,
29
+ "error_rate": 0.1305,
30
+ "savings": 0.5281,
31
+ "accuracy": 0.8695
32
+ },
33
+ "0.2": {
34
+ "lambda": 0.7734,
35
+ "error_rate": 0.1861,
36
+ "savings": 0.6321,
37
+ "accuracy": 0.8139
38
+ },
39
+ "0.25": {
40
+ "lambda": 0.7363,
41
+ "error_rate": 0.2257,
42
+ "savings": 0.7091,
43
+ "accuracy": 0.7743
44
+ },
45
+ "0.3": {
46
+ "lambda": 0.7017,
47
+ "error_rate": 0.2717,
48
+ "savings": 0.7679,
49
+ "accuracy": 0.7283
50
+ },
51
+ "0.35": {
52
+ "lambda": 0.6558999999999999,
53
+ "error_rate": 0.323,
54
+ "savings": 0.834,
55
+ "accuracy": 0.677
56
+ },
57
+ "0.4": {
58
+ "lambda": 0.5794,
59
+ "error_rate": 0.3775,
60
+ "savings": 0.9036,
61
+ "accuracy": 0.6225
62
+ },
63
+ "0.5": {
64
+ "lambda": 9.999999999998899e-05,
65
+ "error_rate": 0.4075,
66
+ "savings": 0.9497,
67
+ "accuracy": 0.5925
68
+ }
69
+ }
70
+ }
llama-3.3-70b/supervised/no_kq/ood_aime24.json ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "0.01": {
3
+ "lambda": 0.9382,
4
+ "error_rate": 0.0,
5
+ "savings": 0.0024,
6
+ "accuracy": 1.0
7
+ },
8
+ "0.025": {
9
+ "lambda": 0.9159,
10
+ "error_rate": 0.0,
11
+ "savings": 0.0338,
12
+ "accuracy": 1.0
13
+ },
14
+ "0.05": {
15
+ "lambda": 0.8886000000000001,
16
+ "error_rate": 0.0,
17
+ "savings": 0.0952,
18
+ "accuracy": 1.0
19
+ },
20
+ "0.1": {
21
+ "lambda": 0.8489,
22
+ "error_rate": 0.0435,
23
+ "savings": 0.2057,
24
+ "accuracy": 0.9565
25
+ },
26
+ "0.15": {
27
+ "lambda": 0.8142,
28
+ "error_rate": 0.087,
29
+ "savings": 0.3153,
30
+ "accuracy": 0.913
31
+ },
32
+ "0.2": {
33
+ "lambda": 0.7734,
34
+ "error_rate": 0.2174,
35
+ "savings": 0.3871,
36
+ "accuracy": 0.7826
37
+ },
38
+ "0.25": {
39
+ "lambda": 0.7363,
40
+ "error_rate": 0.2609,
41
+ "savings": 0.5131,
42
+ "accuracy": 0.7391
43
+ },
44
+ "0.3": {
45
+ "lambda": 0.7017,
46
+ "error_rate": 0.3043,
47
+ "savings": 0.5721,
48
+ "accuracy": 0.6957
49
+ },
50
+ "0.35": {
51
+ "lambda": 0.6558999999999999,
52
+ "error_rate": 0.3913,
53
+ "savings": 0.6936,
54
+ "accuracy": 0.6087
55
+ },
56
+ "0.4": {
57
+ "lambda": 0.5794,
58
+ "error_rate": 0.4783,
59
+ "savings": 0.7992,
60
+ "accuracy": 0.5217
61
+ },
62
+ "0.5": {
63
+ "lambda": 9.999999999998899e-05,
64
+ "error_rate": 0.5217,
65
+ "savings": 0.9626,
66
+ "accuracy": 0.4783
67
+ }
68
+ }
llama-3.3-70b/supervised/no_kq/ood_aime25.json ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "0.01": {
3
+ "lambda": 0.9382,
4
+ "error_rate": 0.0,
5
+ "savings": 0.0,
6
+ "accuracy": 1.0
7
+ },
8
+ "0.025": {
9
+ "lambda": 0.9159,
10
+ "error_rate": 0.0,
11
+ "savings": 0.0118,
12
+ "accuracy": 1.0
13
+ },
14
+ "0.05": {
15
+ "lambda": 0.8886000000000001,
16
+ "error_rate": 0.0,
17
+ "savings": 0.1162,
18
+ "accuracy": 1.0
19
+ },
20
+ "0.1": {
21
+ "lambda": 0.8489,
22
+ "error_rate": 0.0476,
23
+ "savings": 0.2534,
24
+ "accuracy": 0.9524
25
+ },
26
+ "0.15": {
27
+ "lambda": 0.8142,
28
+ "error_rate": 0.0952,
29
+ "savings": 0.3326,
30
+ "accuracy": 0.9048
31
+ },
32
+ "0.2": {
33
+ "lambda": 0.7734,
34
+ "error_rate": 0.2381,
35
+ "savings": 0.4854,
36
+ "accuracy": 0.7619
37
+ },
38
+ "0.25": {
39
+ "lambda": 0.7363,
40
+ "error_rate": 0.2381,
41
+ "savings": 0.5396,
42
+ "accuracy": 0.7619
43
+ },
44
+ "0.3": {
45
+ "lambda": 0.7017,
46
+ "error_rate": 0.3333,
47
+ "savings": 0.7042,
48
+ "accuracy": 0.6667
49
+ },
50
+ "0.35": {
51
+ "lambda": 0.6558999999999999,
52
+ "error_rate": 0.4286,
53
+ "savings": 0.7611,
54
+ "accuracy": 0.5714
55
+ },
56
+ "0.4": {
57
+ "lambda": 0.5794,
58
+ "error_rate": 0.6667,
59
+ "savings": 0.8989,
60
+ "accuracy": 0.3333
61
+ },
62
+ "0.5": {
63
+ "lambda": 9.999999999998899e-05,
64
+ "error_rate": 0.7619,
65
+ "savings": 0.9683,
66
+ "accuracy": 0.2381
67
+ }
68
+ }
llama-3.3-70b/supervised/no_kq/ood_aime26.json ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "0.01": {
3
+ "lambda": 0.9382,
4
+ "error_rate": 0.0,
5
+ "savings": 0.0131,
6
+ "accuracy": 1.0
7
+ },
8
+ "0.025": {
9
+ "lambda": 0.9159,
10
+ "error_rate": 0.0,
11
+ "savings": 0.0246,
12
+ "accuracy": 1.0
13
+ },
14
+ "0.05": {
15
+ "lambda": 0.8886000000000001,
16
+ "error_rate": 0.0,
17
+ "savings": 0.0873,
18
+ "accuracy": 1.0
19
+ },
20
+ "0.1": {
21
+ "lambda": 0.8489,
22
+ "error_rate": 0.0385,
23
+ "savings": 0.2188,
24
+ "accuracy": 0.9615
25
+ },
26
+ "0.15": {
27
+ "lambda": 0.8142,
28
+ "error_rate": 0.1154,
29
+ "savings": 0.3183,
30
+ "accuracy": 0.8846
31
+ },
32
+ "0.2": {
33
+ "lambda": 0.7734,
34
+ "error_rate": 0.2692,
35
+ "savings": 0.5766,
36
+ "accuracy": 0.7308
37
+ },
38
+ "0.25": {
39
+ "lambda": 0.7363,
40
+ "error_rate": 0.3846,
41
+ "savings": 0.6703,
42
+ "accuracy": 0.6154
43
+ },
44
+ "0.3": {
45
+ "lambda": 0.7017,
46
+ "error_rate": 0.4231,
47
+ "savings": 0.734,
48
+ "accuracy": 0.5769
49
+ },
50
+ "0.35": {
51
+ "lambda": 0.6558999999999999,
52
+ "error_rate": 0.5385,
53
+ "savings": 0.8369,
54
+ "accuracy": 0.4615
55
+ },
56
+ "0.4": {
57
+ "lambda": 0.5794,
58
+ "error_rate": 0.6154,
59
+ "savings": 0.9442,
60
+ "accuracy": 0.3846
61
+ },
62
+ "0.5": {
63
+ "lambda": 9.999999999998899e-05,
64
+ "error_rate": 0.6154,
65
+ "savings": 0.9686,
66
+ "accuracy": 0.3846
67
+ }
68
+ }
llama-3.3-70b/supervised/no_kq/ood_gpqa_diamond.json ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "0.01": {
3
+ "lambda": 0.9382,
4
+ "error_rate": 0.0377,
5
+ "savings": 0.097,
6
+ "accuracy": 0.9623
7
+ },
8
+ "0.025": {
9
+ "lambda": 0.9159,
10
+ "error_rate": 0.0849,
11
+ "savings": 0.2106,
12
+ "accuracy": 0.9151
13
+ },
14
+ "0.05": {
15
+ "lambda": 0.8886000000000001,
16
+ "error_rate": 0.1887,
17
+ "savings": 0.3912,
18
+ "accuracy": 0.8113
19
+ },
20
+ "0.1": {
21
+ "lambda": 0.8489,
22
+ "error_rate": 0.3491,
23
+ "savings": 0.6266,
24
+ "accuracy": 0.6509
25
+ },
26
+ "0.15": {
27
+ "lambda": 0.8142,
28
+ "error_rate": 0.3868,
29
+ "savings": 0.7771,
30
+ "accuracy": 0.6132
31
+ },
32
+ "0.2": {
33
+ "lambda": 0.7734,
34
+ "error_rate": 0.4434,
35
+ "savings": 0.8936,
36
+ "accuracy": 0.5566
37
+ },
38
+ "0.25": {
39
+ "lambda": 0.7363,
40
+ "error_rate": 0.4528,
41
+ "savings": 0.9361,
42
+ "accuracy": 0.5472
43
+ },
44
+ "0.3": {
45
+ "lambda": 0.7017,
46
+ "error_rate": 0.4811,
47
+ "savings": 0.9536,
48
+ "accuracy": 0.5189
49
+ },
50
+ "0.35": {
51
+ "lambda": 0.6558999999999999,
52
+ "error_rate": 0.4811,
53
+ "savings": 0.9657,
54
+ "accuracy": 0.5189
55
+ },
56
+ "0.4": {
57
+ "lambda": 0.5794,
58
+ "error_rate": 0.4811,
59
+ "savings": 0.9695,
60
+ "accuracy": 0.5189
61
+ },
62
+ "0.5": {
63
+ "lambda": 9.999999999998899e-05,
64
+ "error_rate": 0.4811,
65
+ "savings": 0.9695,
66
+ "accuracy": 0.5189
67
+ }
68
+ }
llama-3.3-70b/supervised/no_kq/ood_math500.json ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "0.01": {
3
+ "lambda": 0.9382,
4
+ "error_rate": 0.002,
5
+ "savings": 0.1291,
6
+ "accuracy": 0.998
7
+ },
8
+ "0.025": {
9
+ "lambda": 0.9159,
10
+ "error_rate": 0.0041,
11
+ "savings": 0.2712,
12
+ "accuracy": 0.9959
13
+ },
14
+ "0.05": {
15
+ "lambda": 0.8886000000000001,
16
+ "error_rate": 0.0122,
17
+ "savings": 0.4343,
18
+ "accuracy": 0.9878
19
+ },
20
+ "0.1": {
21
+ "lambda": 0.8489,
22
+ "error_rate": 0.0265,
23
+ "savings": 0.599,
24
+ "accuracy": 0.9735
25
+ },
26
+ "0.15": {
27
+ "lambda": 0.8142,
28
+ "error_rate": 0.0407,
29
+ "savings": 0.6907,
30
+ "accuracy": 0.9593
31
+ },
32
+ "0.2": {
33
+ "lambda": 0.7734,
34
+ "error_rate": 0.0713,
35
+ "savings": 0.7782,
36
+ "accuracy": 0.9287
37
+ },
38
+ "0.25": {
39
+ "lambda": 0.7363,
40
+ "error_rate": 0.0774,
41
+ "savings": 0.8149,
42
+ "accuracy": 0.9226
43
+ },
44
+ "0.3": {
45
+ "lambda": 0.7017,
46
+ "error_rate": 0.0957,
47
+ "savings": 0.8389,
48
+ "accuracy": 0.9043
49
+ },
50
+ "0.35": {
51
+ "lambda": 0.6558999999999999,
52
+ "error_rate": 0.1079,
53
+ "savings": 0.8603,
54
+ "accuracy": 0.8921
55
+ },
56
+ "0.4": {
57
+ "lambda": 0.5794,
58
+ "error_rate": 0.1161,
59
+ "savings": 0.8721,
60
+ "accuracy": 0.8839
61
+ },
62
+ "0.5": {
63
+ "lambda": 9.999999999998899e-05,
64
+ "error_rate": 0.1181,
65
+ "savings": 0.8764,
66
+ "accuracy": 0.8819
67
+ }
68
+ }
llama-3.3-70b/supervised/no_kq/probe.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a4fd23e7f353e515c4829282b8ff92f01ce1ea5b447da6219a2a42dea3b4af8f
3
+ size 34940
llama-3.3-70b/supervised/qk_dh128/config.json ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "config": "configs/llama70b_5k.yaml",
3
+ "method": "ttt",
4
+ "dataset_path": [
5
+ "data_prepare/output/llama70b/s1k/dataset.pkl",
6
+ "data_prepare/output/llama70b/openr1_2k/dataset.pkl",
7
+ "data_prepare/output/llama70b/deepmath_2k/dataset.pkl"
8
+ ],
9
+ "ood_paths": [
10
+ "data_prepare/output/llama70b/aime24/dataset.pkl",
11
+ "data_prepare/output/llama70b/aime25/dataset.pkl",
12
+ "data_prepare/output/llama70b/aime26/dataset.pkl",
13
+ "data_prepare/output/llama70b/math500/dataset.pkl",
14
+ "data_prepare/output/llama70b/gpqa_diamond/dataset.pkl"
15
+ ],
16
+ "output_dir": "results/llama70b_5k",
17
+ "label_mode": "supervised",
18
+ "batch_size": 10,
19
+ "seed": 42,
20
+ "smooth_window": 10,
21
+ "run_name": "ttt__dh128__lr0.01__ep40",
22
+ "d_hidden": 128,
23
+ "use_ln": false,
24
+ "use_residual": false,
25
+ "learnable_eta": false,
26
+ "base_lr": 0.01,
27
+ "share_kq": false,
28
+ "use_mlp": false,
29
+ "use_pca": false,
30
+ "pca_dim": 256,
31
+ "epochs": 10,
32
+ "outer_lr": 0.001,
33
+ "no_meta_train": false,
34
+ "no_online_update": false,
35
+ "no_kq": false,
36
+ "grad_clip": 1.0,
37
+ "force_retrain": true,
38
+ "save_every": 10,
39
+ "d_phi": 8192,
40
+ "timestamp": "2026-03-30T01:38:20.174996",
41
+ "release_target": "llama-3.3-70b/supervised/qk_dh128",
42
+ "release_probe_source": "llama70b_5k/supervised/ttt__dh128__lr0.01__final_ep10/probe.pt"
43
+ }
llama-3.3-70b/supervised/qk_dh128/lambdas.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "0.01": 0.9969,
3
+ "0.025": 0.9913,
4
+ "0.05": 0.9856,
5
+ "0.1": 0.971,
6
+ "0.15": 0.9573,
7
+ "0.2": 0.9441,
8
+ "0.25": 0.9275,
9
+ "0.3": 0.9108,
10
+ "0.35": 0.877,
11
+ "0.4": 0.8209,
12
+ "0.5": 9.999999999998899e-05
13
+ }
llama-3.3-70b/supervised/qk_dh128/metrics.json ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "eps_results": {
3
+ "0.01": {
4
+ "lambda": 0.9969,
5
+ "error_rate": 0.0053,
6
+ "savings": 0.0223,
7
+ "accuracy": 0.9947
8
+ },
9
+ "0.025": {
10
+ "lambda": 0.9913,
11
+ "error_rate": 0.016,
12
+ "savings": 0.0884,
13
+ "accuracy": 0.984
14
+ },
15
+ "0.05": {
16
+ "lambda": 0.9856,
17
+ "error_rate": 0.0385,
18
+ "savings": 0.1767,
19
+ "accuracy": 0.9615
20
+ },
21
+ "0.1": {
22
+ "lambda": 0.971,
23
+ "error_rate": 0.0813,
24
+ "savings": 0.378,
25
+ "accuracy": 0.9187
26
+ },
27
+ "0.15": {
28
+ "lambda": 0.9573,
29
+ "error_rate": 0.139,
30
+ "savings": 0.5199,
31
+ "accuracy": 0.861
32
+ },
33
+ "0.2": {
34
+ "lambda": 0.9441,
35
+ "error_rate": 0.1754,
36
+ "savings": 0.6083,
37
+ "accuracy": 0.8246
38
+ },
39
+ "0.25": {
40
+ "lambda": 0.9275,
41
+ "error_rate": 0.2235,
42
+ "savings": 0.7029,
43
+ "accuracy": 0.7765
44
+ },
45
+ "0.3": {
46
+ "lambda": 0.9108,
47
+ "error_rate": 0.2556,
48
+ "savings": 0.7558,
49
+ "accuracy": 0.7444
50
+ },
51
+ "0.35": {
52
+ "lambda": 0.877,
53
+ "error_rate": 0.3123,
54
+ "savings": 0.8364,
55
+ "accuracy": 0.6877
56
+ },
57
+ "0.4": {
58
+ "lambda": 0.8209,
59
+ "error_rate": 0.3679,
60
+ "savings": 0.9008,
61
+ "accuracy": 0.6321
62
+ },
63
+ "0.5": {
64
+ "lambda": 9.999999999998899e-05,
65
+ "error_rate": 0.4075,
66
+ "savings": 0.9497,
67
+ "accuracy": 0.5925
68
+ }
69
+ }
70
+ }
llama-3.3-70b/supervised/qk_dh128/ood_aime24.json ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "0.01": {
3
+ "lambda": 0.9969,
4
+ "error_rate": 0.0,
5
+ "savings": 0.0,
6
+ "accuracy": 1.0
7
+ },
8
+ "0.025": {
9
+ "lambda": 0.9913,
10
+ "error_rate": 0.0,
11
+ "savings": 0.0245,
12
+ "accuracy": 1.0
13
+ },
14
+ "0.05": {
15
+ "lambda": 0.9856,
16
+ "error_rate": 0.0,
17
+ "savings": 0.0647,
18
+ "accuracy": 1.0
19
+ },
20
+ "0.1": {
21
+ "lambda": 0.971,
22
+ "error_rate": 0.087,
23
+ "savings": 0.1996,
24
+ "accuracy": 0.913
25
+ },
26
+ "0.15": {
27
+ "lambda": 0.9573,
28
+ "error_rate": 0.1739,
29
+ "savings": 0.402,
30
+ "accuracy": 0.8261
31
+ },
32
+ "0.2": {
33
+ "lambda": 0.9441,
34
+ "error_rate": 0.1739,
35
+ "savings": 0.4575,
36
+ "accuracy": 0.8261
37
+ },
38
+ "0.25": {
39
+ "lambda": 0.9275,
40
+ "error_rate": 0.3478,
41
+ "savings": 0.5821,
42
+ "accuracy": 0.6522
43
+ },
44
+ "0.3": {
45
+ "lambda": 0.9108,
46
+ "error_rate": 0.3913,
47
+ "savings": 0.7312,
48
+ "accuracy": 0.6087
49
+ },
50
+ "0.35": {
51
+ "lambda": 0.877,
52
+ "error_rate": 0.4783,
53
+ "savings": 0.8874,
54
+ "accuracy": 0.5217
55
+ },
56
+ "0.4": {
57
+ "lambda": 0.8209,
58
+ "error_rate": 0.5217,
59
+ "savings": 0.927,
60
+ "accuracy": 0.4783
61
+ },
62
+ "0.5": {
63
+ "lambda": 9.999999999998899e-05,
64
+ "error_rate": 0.5217,
65
+ "savings": 0.9626,
66
+ "accuracy": 0.4783
67
+ }
68
+ }
llama-3.3-70b/supervised/qk_dh128/ood_aime25.json ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "0.01": {
3
+ "lambda": 0.9969,
4
+ "error_rate": 0.0,
5
+ "savings": 0.0,
6
+ "accuracy": 1.0
7
+ },
8
+ "0.025": {
9
+ "lambda": 0.9913,
10
+ "error_rate": 0.0,
11
+ "savings": 0.0292,
12
+ "accuracy": 1.0
13
+ },
14
+ "0.05": {
15
+ "lambda": 0.9856,
16
+ "error_rate": 0.0,
17
+ "savings": 0.0833,
18
+ "accuracy": 1.0
19
+ },
20
+ "0.1": {
21
+ "lambda": 0.971,
22
+ "error_rate": 0.0952,
23
+ "savings": 0.3089,
24
+ "accuracy": 0.9048
25
+ },
26
+ "0.15": {
27
+ "lambda": 0.9573,
28
+ "error_rate": 0.1429,
29
+ "savings": 0.3788,
30
+ "accuracy": 0.8571
31
+ },
32
+ "0.2": {
33
+ "lambda": 0.9441,
34
+ "error_rate": 0.1429,
35
+ "savings": 0.424,
36
+ "accuracy": 0.8571
37
+ },
38
+ "0.25": {
39
+ "lambda": 0.9275,
40
+ "error_rate": 0.2857,
41
+ "savings": 0.5585,
42
+ "accuracy": 0.7143
43
+ },
44
+ "0.3": {
45
+ "lambda": 0.9108,
46
+ "error_rate": 0.381,
47
+ "savings": 0.6373,
48
+ "accuracy": 0.619
49
+ },
50
+ "0.35": {
51
+ "lambda": 0.877,
52
+ "error_rate": 0.5238,
53
+ "savings": 0.7687,
54
+ "accuracy": 0.4762
55
+ },
56
+ "0.4": {
57
+ "lambda": 0.8209,
58
+ "error_rate": 0.6667,
59
+ "savings": 0.9211,
60
+ "accuracy": 0.3333
61
+ },
62
+ "0.5": {
63
+ "lambda": 9.999999999998899e-05,
64
+ "error_rate": 0.7619,
65
+ "savings": 0.9683,
66
+ "accuracy": 0.2381
67
+ }
68
+ }
llama-3.3-70b/supervised/qk_dh128/ood_aime26.json ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "0.01": {
3
+ "lambda": 0.9969,
4
+ "error_rate": 0.0,
5
+ "savings": 0.017,
6
+ "accuracy": 1.0
7
+ },
8
+ "0.025": {
9
+ "lambda": 0.9913,
10
+ "error_rate": 0.0,
11
+ "savings": 0.0263,
12
+ "accuracy": 1.0
13
+ },
14
+ "0.05": {
15
+ "lambda": 0.9856,
16
+ "error_rate": 0.0385,
17
+ "savings": 0.0995,
18
+ "accuracy": 0.9615
19
+ },
20
+ "0.1": {
21
+ "lambda": 0.971,
22
+ "error_rate": 0.1154,
23
+ "savings": 0.3059,
24
+ "accuracy": 0.8846
25
+ },
26
+ "0.15": {
27
+ "lambda": 0.9573,
28
+ "error_rate": 0.2692,
29
+ "savings": 0.5312,
30
+ "accuracy": 0.7308
31
+ },
32
+ "0.2": {
33
+ "lambda": 0.9441,
34
+ "error_rate": 0.3077,
35
+ "savings": 0.5872,
36
+ "accuracy": 0.6923
37
+ },
38
+ "0.25": {
39
+ "lambda": 0.9275,
40
+ "error_rate": 0.3462,
41
+ "savings": 0.6452,
42
+ "accuracy": 0.6538
43
+ },
44
+ "0.3": {
45
+ "lambda": 0.9108,
46
+ "error_rate": 0.4231,
47
+ "savings": 0.6927,
48
+ "accuracy": 0.5769
49
+ },
50
+ "0.35": {
51
+ "lambda": 0.877,
52
+ "error_rate": 0.5385,
53
+ "savings": 0.8578,
54
+ "accuracy": 0.4615
55
+ },
56
+ "0.4": {
57
+ "lambda": 0.8209,
58
+ "error_rate": 0.5769,
59
+ "savings": 0.904,
60
+ "accuracy": 0.4231
61
+ },
62
+ "0.5": {
63
+ "lambda": 9.999999999998899e-05,
64
+ "error_rate": 0.6154,
65
+ "savings": 0.9686,
66
+ "accuracy": 0.3846
67
+ }
68
+ }
llama-3.3-70b/supervised/qk_dh128/ood_gpqa_diamond.json ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "0.01": {
3
+ "lambda": 0.9969,
4
+ "error_rate": 0.0849,
5
+ "savings": 0.1121,
6
+ "accuracy": 0.9151
7
+ },
8
+ "0.025": {
9
+ "lambda": 0.9913,
10
+ "error_rate": 0.1321,
11
+ "savings": 0.271,
12
+ "accuracy": 0.8679
13
+ },
14
+ "0.05": {
15
+ "lambda": 0.9856,
16
+ "error_rate": 0.1887,
17
+ "savings": 0.3944,
18
+ "accuracy": 0.8113
19
+ },
20
+ "0.1": {
21
+ "lambda": 0.971,
22
+ "error_rate": 0.2925,
23
+ "savings": 0.5771,
24
+ "accuracy": 0.7075
25
+ },
26
+ "0.15": {
27
+ "lambda": 0.9573,
28
+ "error_rate": 0.3774,
29
+ "savings": 0.7035,
30
+ "accuracy": 0.6226
31
+ },
32
+ "0.2": {
33
+ "lambda": 0.9441,
34
+ "error_rate": 0.3962,
35
+ "savings": 0.7595,
36
+ "accuracy": 0.6038
37
+ },
38
+ "0.25": {
39
+ "lambda": 0.9275,
40
+ "error_rate": 0.434,
41
+ "savings": 0.8436,
42
+ "accuracy": 0.566
43
+ },
44
+ "0.3": {
45
+ "lambda": 0.9108,
46
+ "error_rate": 0.434,
47
+ "savings": 0.8973,
48
+ "accuracy": 0.566
49
+ },
50
+ "0.35": {
51
+ "lambda": 0.877,
52
+ "error_rate": 0.4623,
53
+ "savings": 0.9408,
54
+ "accuracy": 0.5377
55
+ },
56
+ "0.4": {
57
+ "lambda": 0.8209,
58
+ "error_rate": 0.4811,
59
+ "savings": 0.9649,
60
+ "accuracy": 0.5189
61
+ },
62
+ "0.5": {
63
+ "lambda": 9.999999999998899e-05,
64
+ "error_rate": 0.4811,
65
+ "savings": 0.9695,
66
+ "accuracy": 0.5189
67
+ }
68
+ }
llama-3.3-70b/supervised/qk_dh128/ood_math500.json ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "0.01": {
3
+ "lambda": 0.9969,
4
+ "error_rate": 0.0,
5
+ "savings": 0.0922,
6
+ "accuracy": 1.0
7
+ },
8
+ "0.025": {
9
+ "lambda": 0.9913,
10
+ "error_rate": 0.0081,
11
+ "savings": 0.3619,
12
+ "accuracy": 0.9919
13
+ },
14
+ "0.05": {
15
+ "lambda": 0.9856,
16
+ "error_rate": 0.0224,
17
+ "savings": 0.5167,
18
+ "accuracy": 0.9776
19
+ },
20
+ "0.1": {
21
+ "lambda": 0.971,
22
+ "error_rate": 0.0387,
23
+ "savings": 0.6876,
24
+ "accuracy": 0.9613
25
+ },
26
+ "0.15": {
27
+ "lambda": 0.9573,
28
+ "error_rate": 0.0591,
29
+ "savings": 0.7632,
30
+ "accuracy": 0.9409
31
+ },
32
+ "0.2": {
33
+ "lambda": 0.9441,
34
+ "error_rate": 0.0815,
35
+ "savings": 0.8065,
36
+ "accuracy": 0.9185
37
+ },
38
+ "0.25": {
39
+ "lambda": 0.9275,
40
+ "error_rate": 0.0916,
41
+ "savings": 0.8372,
42
+ "accuracy": 0.9084
43
+ },
44
+ "0.3": {
45
+ "lambda": 0.9108,
46
+ "error_rate": 0.1059,
47
+ "savings": 0.8582,
48
+ "accuracy": 0.8941
49
+ },
50
+ "0.35": {
51
+ "lambda": 0.877,
52
+ "error_rate": 0.1141,
53
+ "savings": 0.8713,
54
+ "accuracy": 0.8859
55
+ },
56
+ "0.4": {
57
+ "lambda": 0.8209,
58
+ "error_rate": 0.1181,
59
+ "savings": 0.8756,
60
+ "accuracy": 0.8819
61
+ },
62
+ "0.5": {
63
+ "lambda": 9.999999999998899e-05,
64
+ "error_rate": 0.1181,
65
+ "savings": 0.8764,
66
+ "accuracy": 0.8819
67
+ }
68
+ }
llama-3.3-70b/supervised/qk_dh128/probe.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:47ca9fb1c6798e4dabe12ac3e18522ca814000b719b5fa63e8624df5808c4268
3
+ size 8391930
qwen2.5-32b/consistent/no_kq/config.json ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "config": "configs/qwen32b_5k.yaml",
3
+ "method": "ttt",
4
+ "dataset_path": [
5
+ "data_prepare/output/qwen32b/s1k/dataset.pkl",
6
+ "data_prepare/output/qwen32b/openr1_2k/dataset.pkl",
7
+ "data_prepare/output/qwen32b/deepmath_2k/dataset.pkl"
8
+ ],
9
+ "ood_paths": [
10
+ "data_prepare/output/qwen32b/aime24/dataset.pkl",
11
+ "data_prepare/output/qwen32b/aime25/dataset.pkl",
12
+ "data_prepare/output/qwen32b/aime26/dataset.pkl",
13
+ "data_prepare/output/qwen32b/math500/dataset.pkl",
14
+ "data_prepare/output/qwen32b/gpqa_diamond/dataset.pkl"
15
+ ],
16
+ "output_dir": "results/qwen32b_5k",
17
+ "label_mode": "consistent",
18
+ "batch_size": 10,
19
+ "seed": 42,
20
+ "smooth_window": 10,
21
+ "run_name": "ttt__no_kq__lr0.01",
22
+ "d_hidden": 64,
23
+ "use_ln": false,
24
+ "use_residual": false,
25
+ "learnable_eta": false,
26
+ "base_lr": 0.01,
27
+ "share_kq": false,
28
+ "use_mlp": false,
29
+ "use_pca": false,
30
+ "pca_dim": 256,
31
+ "epochs": 20,
32
+ "outer_lr": 0.001,
33
+ "no_meta_train": false,
34
+ "no_online_update": false,
35
+ "no_kq": true,
36
+ "grad_clip": 1.0,
37
+ "force_retrain": false,
38
+ "d_phi": 5120,
39
+ "timestamp": "2026-03-27T22:40:13.431109",
40
+ "release_target": "qwen2.5-32b/consistent/no_kq",
41
+ "release_probe_source": "qwen32b_5k/consistent/ttt__no_kq__lr0.01/checkpoints/probe_ep20.pt"
42
+ }
qwen2.5-32b/consistent/no_kq/lambdas.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "0.01": 0.9555,
3
+ "0.025": 0.9279,
4
+ "0.05": 0.9062,
5
+ "0.1": 0.8543000000000001,
6
+ "0.15": 0.8158,
7
+ "0.2": 0.7741,
8
+ "0.25": 0.7341,
9
+ "0.3": 0.6795,
10
+ "0.35": 0.6321,
11
+ "0.4": 0.5152,
12
+ "0.5": 9.999999999998899e-05
13
+ }
qwen2.5-32b/consistent/no_kq/metrics.json ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "eps_results": {
3
+ "0.01": {
4
+ "lambda": 0.9555,
5
+ "error_rate": 0.011,
6
+ "savings": 0.0213,
7
+ "accuracy": 0.989
8
+ },
9
+ "0.025": {
10
+ "lambda": 0.9279,
11
+ "error_rate": 0.024,
12
+ "savings": 0.124,
13
+ "accuracy": 0.976
14
+ },
15
+ "0.05": {
16
+ "lambda": 0.9062,
17
+ "error_rate": 0.045,
18
+ "savings": 0.2197,
19
+ "accuracy": 0.955
20
+ },
21
+ "0.1": {
22
+ "lambda": 0.8543000000000001,
23
+ "error_rate": 0.096,
24
+ "savings": 0.4073,
25
+ "accuracy": 0.904
26
+ },
27
+ "0.15": {
28
+ "lambda": 0.8158,
29
+ "error_rate": 0.141,
30
+ "savings": 0.5292,
31
+ "accuracy": 0.859
32
+ },
33
+ "0.2": {
34
+ "lambda": 0.7741,
35
+ "error_rate": 0.193,
36
+ "savings": 0.6441,
37
+ "accuracy": 0.807
38
+ },
39
+ "0.25": {
40
+ "lambda": 0.7341,
41
+ "error_rate": 0.234,
42
+ "savings": 0.7307,
43
+ "accuracy": 0.766
44
+ },
45
+ "0.3": {
46
+ "lambda": 0.6795,
47
+ "error_rate": 0.296,
48
+ "savings": 0.8146,
49
+ "accuracy": 0.704
50
+ },
51
+ "0.35": {
52
+ "lambda": 0.6321,
53
+ "error_rate": 0.331,
54
+ "savings": 0.8668,
55
+ "accuracy": 0.669
56
+ },
57
+ "0.4": {
58
+ "lambda": 0.5152,
59
+ "error_rate": 0.371,
60
+ "savings": 0.9334,
61
+ "accuracy": 0.629
62
+ },
63
+ "0.5": {
64
+ "lambda": 9.999999999998899e-05,
65
+ "error_rate": 0.382,
66
+ "savings": 0.9522,
67
+ "accuracy": 0.618
68
+ }
69
+ }
70
+ }
qwen2.5-32b/consistent/no_kq/ood_aime24.json ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "0.01": {
3
+ "lambda": 0.9555,
4
+ "error_rate": 0.0,
5
+ "savings": 0.0,
6
+ "accuracy": 1.0
7
+ },
8
+ "0.025": {
9
+ "lambda": 0.9279,
10
+ "error_rate": 0.0333,
11
+ "savings": 0.0354,
12
+ "accuracy": 0.9667
13
+ },
14
+ "0.05": {
15
+ "lambda": 0.9062,
16
+ "error_rate": 0.0333,
17
+ "savings": 0.0462,
18
+ "accuracy": 0.9667
19
+ },
20
+ "0.1": {
21
+ "lambda": 0.8543000000000001,
22
+ "error_rate": 0.0333,
23
+ "savings": 0.1406,
24
+ "accuracy": 0.9667
25
+ },
26
+ "0.15": {
27
+ "lambda": 0.8158,
28
+ "error_rate": 0.0333,
29
+ "savings": 0.263,
30
+ "accuracy": 0.9667
31
+ },
32
+ "0.2": {
33
+ "lambda": 0.7741,
34
+ "error_rate": 0.1,
35
+ "savings": 0.4018,
36
+ "accuracy": 0.9
37
+ },
38
+ "0.25": {
39
+ "lambda": 0.7341,
40
+ "error_rate": 0.2667,
41
+ "savings": 0.5115,
42
+ "accuracy": 0.7333
43
+ },
44
+ "0.3": {
45
+ "lambda": 0.6795,
46
+ "error_rate": 0.3333,
47
+ "savings": 0.7286,
48
+ "accuracy": 0.6667
49
+ },
50
+ "0.35": {
51
+ "lambda": 0.6321,
52
+ "error_rate": 0.4333,
53
+ "savings": 0.8066,
54
+ "accuracy": 0.5667
55
+ },
56
+ "0.4": {
57
+ "lambda": 0.5152,
58
+ "error_rate": 0.4667,
59
+ "savings": 0.945,
60
+ "accuracy": 0.5333
61
+ },
62
+ "0.5": {
63
+ "lambda": 9.999999999998899e-05,
64
+ "error_rate": 0.4667,
65
+ "savings": 0.9702,
66
+ "accuracy": 0.5333
67
+ }
68
+ }
qwen2.5-32b/consistent/no_kq/ood_aime25.json ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "0.01": {
3
+ "lambda": 0.9555,
4
+ "error_rate": 0.0,
5
+ "savings": 0.0,
6
+ "accuracy": 1.0
7
+ },
8
+ "0.025": {
9
+ "lambda": 0.9279,
10
+ "error_rate": 0.0,
11
+ "savings": 0.0151,
12
+ "accuracy": 1.0
13
+ },
14
+ "0.05": {
15
+ "lambda": 0.9062,
16
+ "error_rate": 0.0,
17
+ "savings": 0.0186,
18
+ "accuracy": 1.0
19
+ },
20
+ "0.1": {
21
+ "lambda": 0.8543000000000001,
22
+ "error_rate": 0.0667,
23
+ "savings": 0.1661,
24
+ "accuracy": 0.9333
25
+ },
26
+ "0.15": {
27
+ "lambda": 0.8158,
28
+ "error_rate": 0.0667,
29
+ "savings": 0.2264,
30
+ "accuracy": 0.9333
31
+ },
32
+ "0.2": {
33
+ "lambda": 0.7741,
34
+ "error_rate": 0.1667,
35
+ "savings": 0.3693,
36
+ "accuracy": 0.8333
37
+ },
38
+ "0.25": {
39
+ "lambda": 0.7341,
40
+ "error_rate": 0.3,
41
+ "savings": 0.5924,
42
+ "accuracy": 0.7
43
+ },
44
+ "0.3": {
45
+ "lambda": 0.6795,
46
+ "error_rate": 0.3333,
47
+ "savings": 0.7102,
48
+ "accuracy": 0.6667
49
+ },
50
+ "0.35": {
51
+ "lambda": 0.6321,
52
+ "error_rate": 0.4333,
53
+ "savings": 0.8036,
54
+ "accuracy": 0.5667
55
+ },
56
+ "0.4": {
57
+ "lambda": 0.5152,
58
+ "error_rate": 0.5333,
59
+ "savings": 0.9255,
60
+ "accuracy": 0.4667
61
+ },
62
+ "0.5": {
63
+ "lambda": 9.999999999998899e-05,
64
+ "error_rate": 0.6,
65
+ "savings": 0.9647,
66
+ "accuracy": 0.4
67
+ }
68
+ }
qwen2.5-32b/consistent/no_kq/ood_aime26.json ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "0.01": {
3
+ "lambda": 0.9555,
4
+ "error_rate": 0.0,
5
+ "savings": 0.0144,
6
+ "accuracy": 1.0
7
+ },
8
+ "0.025": {
9
+ "lambda": 0.9279,
10
+ "error_rate": 0.0,
11
+ "savings": 0.0289,
12
+ "accuracy": 1.0
13
+ },
14
+ "0.05": {
15
+ "lambda": 0.9062,
16
+ "error_rate": 0.0,
17
+ "savings": 0.0498,
18
+ "accuracy": 1.0
19
+ },
20
+ "0.1": {
21
+ "lambda": 0.8543000000000001,
22
+ "error_rate": 0.0667,
23
+ "savings": 0.1544,
24
+ "accuracy": 0.9333
25
+ },
26
+ "0.15": {
27
+ "lambda": 0.8158,
28
+ "error_rate": 0.1,
29
+ "savings": 0.2449,
30
+ "accuracy": 0.9
31
+ },
32
+ "0.2": {
33
+ "lambda": 0.7741,
34
+ "error_rate": 0.1333,
35
+ "savings": 0.3388,
36
+ "accuracy": 0.8667
37
+ },
38
+ "0.25": {
39
+ "lambda": 0.7341,
40
+ "error_rate": 0.3,
41
+ "savings": 0.5093,
42
+ "accuracy": 0.7
43
+ },
44
+ "0.3": {
45
+ "lambda": 0.6795,
46
+ "error_rate": 0.3333,
47
+ "savings": 0.6242,
48
+ "accuracy": 0.6667
49
+ },
50
+ "0.35": {
51
+ "lambda": 0.6321,
52
+ "error_rate": 0.3333,
53
+ "savings": 0.6997,
54
+ "accuracy": 0.6667
55
+ },
56
+ "0.4": {
57
+ "lambda": 0.5152,
58
+ "error_rate": 0.4667,
59
+ "savings": 0.8829,
60
+ "accuracy": 0.5333
61
+ },
62
+ "0.5": {
63
+ "lambda": 9.999999999998899e-05,
64
+ "error_rate": 0.5333,
65
+ "savings": 0.9675,
66
+ "accuracy": 0.4667
67
+ }
68
+ }
qwen2.5-32b/consistent/no_kq/ood_gpqa_diamond.json ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "0.01": {
3
+ "lambda": 0.9555,
4
+ "error_rate": 0.0101,
5
+ "savings": 0.0457,
6
+ "accuracy": 0.9899
7
+ },
8
+ "0.025": {
9
+ "lambda": 0.9279,
10
+ "error_rate": 0.101,
11
+ "savings": 0.209,
12
+ "accuracy": 0.899
13
+ },
14
+ "0.05": {
15
+ "lambda": 0.9062,
16
+ "error_rate": 0.1667,
17
+ "savings": 0.3483,
18
+ "accuracy": 0.8333
19
+ },
20
+ "0.1": {
21
+ "lambda": 0.8543000000000001,
22
+ "error_rate": 0.3182,
23
+ "savings": 0.5983,
24
+ "accuracy": 0.6818
25
+ },
26
+ "0.15": {
27
+ "lambda": 0.8158,
28
+ "error_rate": 0.399,
29
+ "savings": 0.734,
30
+ "accuracy": 0.601
31
+ },
32
+ "0.2": {
33
+ "lambda": 0.7741,
34
+ "error_rate": 0.4495,
35
+ "savings": 0.839,
36
+ "accuracy": 0.5505
37
+ },
38
+ "0.25": {
39
+ "lambda": 0.7341,
40
+ "error_rate": 0.4697,
41
+ "savings": 0.8911,
42
+ "accuracy": 0.5303
43
+ },
44
+ "0.3": {
45
+ "lambda": 0.6795,
46
+ "error_rate": 0.4949,
47
+ "savings": 0.9306,
48
+ "accuracy": 0.5051
49
+ },
50
+ "0.35": {
51
+ "lambda": 0.6321,
52
+ "error_rate": 0.5101,
53
+ "savings": 0.9449,
54
+ "accuracy": 0.4899
55
+ },
56
+ "0.4": {
57
+ "lambda": 0.5152,
58
+ "error_rate": 0.5101,
59
+ "savings": 0.9596,
60
+ "accuracy": 0.4899
61
+ },
62
+ "0.5": {
63
+ "lambda": 9.999999999998899e-05,
64
+ "error_rate": 0.5101,
65
+ "savings": 0.9614,
66
+ "accuracy": 0.4899
67
+ }
68
+ }
qwen2.5-32b/consistent/no_kq/ood_math500.json ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "0.01": {
3
+ "lambda": 0.9555,
4
+ "error_rate": 0.0,
5
+ "savings": 0.0352,
6
+ "accuracy": 1.0
7
+ },
8
+ "0.025": {
9
+ "lambda": 0.9279,
10
+ "error_rate": 0.0,
11
+ "savings": 0.1602,
12
+ "accuracy": 1.0
13
+ },
14
+ "0.05": {
15
+ "lambda": 0.9062,
16
+ "error_rate": 0.0,
17
+ "savings": 0.2828,
18
+ "accuracy": 1.0
19
+ },
20
+ "0.1": {
21
+ "lambda": 0.8543000000000001,
22
+ "error_rate": 0.012,
23
+ "savings": 0.5554,
24
+ "accuracy": 0.988
25
+ },
26
+ "0.15": {
27
+ "lambda": 0.8158,
28
+ "error_rate": 0.026,
29
+ "savings": 0.6714,
30
+ "accuracy": 0.974
31
+ },
32
+ "0.2": {
33
+ "lambda": 0.7741,
34
+ "error_rate": 0.038,
35
+ "savings": 0.7488,
36
+ "accuracy": 0.962
37
+ },
38
+ "0.25": {
39
+ "lambda": 0.7341,
40
+ "error_rate": 0.052,
41
+ "savings": 0.7962,
42
+ "accuracy": 0.948
43
+ },
44
+ "0.3": {
45
+ "lambda": 0.6795,
46
+ "error_rate": 0.072,
47
+ "savings": 0.8429,
48
+ "accuracy": 0.928
49
+ },
50
+ "0.35": {
51
+ "lambda": 0.6321,
52
+ "error_rate": 0.08,
53
+ "savings": 0.8647,
54
+ "accuracy": 0.92
55
+ },
56
+ "0.4": {
57
+ "lambda": 0.5152,
58
+ "error_rate": 0.094,
59
+ "savings": 0.8833,
60
+ "accuracy": 0.906
61
+ },
62
+ "0.5": {
63
+ "lambda": 9.999999999998899e-05,
64
+ "error_rate": 0.1,
65
+ "savings": 0.8907,
66
+ "accuracy": 0.9
67
+ }
68
+ }
qwen2.5-32b/consistent/no_kq/probe.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7b71239aef69766c054f887fd49c714b68638c5810173f4bda9abc0c99877f31
3
+ size 22652
qwen2.5-32b/consistent/qk_dh128/config.json ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "config": "configs/qwen32b_5k.yaml",
3
+ "method": "ttt",
4
+ "dataset_path": [
5
+ "data_prepare/output/qwen32b/s1k/dataset.pkl",
6
+ "data_prepare/output/qwen32b/openr1_2k/dataset.pkl",
7
+ "data_prepare/output/qwen32b/deepmath_2k/dataset.pkl"
8
+ ],
9
+ "ood_paths": [
10
+ "data_prepare/output/qwen32b/aime24/dataset.pkl",
11
+ "data_prepare/output/qwen32b/aime25/dataset.pkl",
12
+ "data_prepare/output/qwen32b/aime26/dataset.pkl",
13
+ "data_prepare/output/qwen32b/math500/dataset.pkl",
14
+ "data_prepare/output/qwen32b/gpqa_diamond/dataset.pkl"
15
+ ],
16
+ "output_dir": "results/qwen32b_5k",
17
+ "label_mode": "consistent",
18
+ "batch_size": 10,
19
+ "seed": 42,
20
+ "smooth_window": 10,
21
+ "run_name": "ttt__dh128__lr0.01",
22
+ "d_hidden": 128,
23
+ "use_ln": false,
24
+ "use_residual": false,
25
+ "learnable_eta": false,
26
+ "base_lr": 0.01,
27
+ "share_kq": false,
28
+ "use_mlp": false,
29
+ "use_pca": false,
30
+ "pca_dim": 256,
31
+ "epochs": 10,
32
+ "outer_lr": 0.001,
33
+ "no_meta_train": false,
34
+ "no_online_update": false,
35
+ "no_kq": false,
36
+ "grad_clip": 1.0,
37
+ "force_retrain": false,
38
+ "d_phi": 5120,
39
+ "timestamp": "2026-03-28T01:01:45.669043",
40
+ "release_target": "qwen2.5-32b/consistent/qk_dh128",
41
+ "release_probe_source": "qwen32b_5k/consistent/ttt__dh128__lr0.01/checkpoints/probe_ep10.pt"
42
+ }
qwen2.5-32b/consistent/qk_dh128/lambdas.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "0.01": 0.9921,
3
+ "0.025": 0.9767,
4
+ "0.05": 0.9482,
5
+ "0.1": 0.8952,
6
+ "0.15": 0.8351,
7
+ "0.2": 0.7674,
8
+ "0.25": 0.6921999999999999,
9
+ "0.3": 0.5946,
10
+ "0.35": 0.4928,
11
+ "0.4": 0.32909999999999995,
12
+ "0.5": 9.999999999998899e-05
13
+ }
qwen2.5-32b/consistent/qk_dh128/metrics.json ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "eps_results": {
3
+ "0.01": {
4
+ "lambda": 0.9921,
5
+ "error_rate": 0.009,
6
+ "savings": 0.0207,
7
+ "accuracy": 0.991
8
+ },
9
+ "0.025": {
10
+ "lambda": 0.9767,
11
+ "error_rate": 0.033,
12
+ "savings": 0.0935,
13
+ "accuracy": 0.967
14
+ },
15
+ "0.05": {
16
+ "lambda": 0.9482,
17
+ "error_rate": 0.064,
18
+ "savings": 0.2315,
19
+ "accuracy": 0.936
20
+ },
21
+ "0.1": {
22
+ "lambda": 0.8952,
23
+ "error_rate": 0.113,
24
+ "savings": 0.3971,
25
+ "accuracy": 0.887
26
+ },
27
+ "0.15": {
28
+ "lambda": 0.8351,
29
+ "error_rate": 0.15,
30
+ "savings": 0.5236,
31
+ "accuracy": 0.85
32
+ },
33
+ "0.2": {
34
+ "lambda": 0.7674,
35
+ "error_rate": 0.187,
36
+ "savings": 0.6288,
37
+ "accuracy": 0.813
38
+ },
39
+ "0.25": {
40
+ "lambda": 0.6921999999999999,
41
+ "error_rate": 0.227,
42
+ "savings": 0.7114,
43
+ "accuracy": 0.773
44
+ },
45
+ "0.3": {
46
+ "lambda": 0.5946,
47
+ "error_rate": 0.28,
48
+ "savings": 0.8033,
49
+ "accuracy": 0.72
50
+ },
51
+ "0.35": {
52
+ "lambda": 0.4928,
53
+ "error_rate": 0.323,
54
+ "savings": 0.8698,
55
+ "accuracy": 0.677
56
+ },
57
+ "0.4": {
58
+ "lambda": 0.32909999999999995,
59
+ "error_rate": 0.364,
60
+ "savings": 0.9308,
61
+ "accuracy": 0.636
62
+ },
63
+ "0.5": {
64
+ "lambda": 9.999999999998899e-05,
65
+ "error_rate": 0.382,
66
+ "savings": 0.9522,
67
+ "accuracy": 0.618
68
+ }
69
+ }
70
+ }
qwen2.5-32b/consistent/qk_dh128/ood_aime24.json ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "0.01": {
3
+ "lambda": 0.9921,
4
+ "error_rate": 0.0,
5
+ "savings": 0.0,
6
+ "accuracy": 1.0
7
+ },
8
+ "0.025": {
9
+ "lambda": 0.9767,
10
+ "error_rate": 0.0,
11
+ "savings": 0.0527,
12
+ "accuracy": 1.0
13
+ },
14
+ "0.05": {
15
+ "lambda": 0.9482,
16
+ "error_rate": 0.0333,
17
+ "savings": 0.0913,
18
+ "accuracy": 0.9667
19
+ },
20
+ "0.1": {
21
+ "lambda": 0.8952,
22
+ "error_rate": 0.0333,
23
+ "savings": 0.1847,
24
+ "accuracy": 0.9667
25
+ },
26
+ "0.15": {
27
+ "lambda": 0.8351,
28
+ "error_rate": 0.0333,
29
+ "savings": 0.303,
30
+ "accuracy": 0.9667
31
+ },
32
+ "0.2": {
33
+ "lambda": 0.7674,
34
+ "error_rate": 0.1667,
35
+ "savings": 0.3927,
36
+ "accuracy": 0.8333
37
+ },
38
+ "0.25": {
39
+ "lambda": 0.6921999999999999,
40
+ "error_rate": 0.3,
41
+ "savings": 0.5937,
42
+ "accuracy": 0.7
43
+ },
44
+ "0.3": {
45
+ "lambda": 0.5946,
46
+ "error_rate": 0.3333,
47
+ "savings": 0.6923,
48
+ "accuracy": 0.6667
49
+ },
50
+ "0.35": {
51
+ "lambda": 0.4928,
52
+ "error_rate": 0.4,
53
+ "savings": 0.8047,
54
+ "accuracy": 0.6
55
+ },
56
+ "0.4": {
57
+ "lambda": 0.32909999999999995,
58
+ "error_rate": 0.4667,
59
+ "savings": 0.9325,
60
+ "accuracy": 0.5333
61
+ },
62
+ "0.5": {
63
+ "lambda": 9.999999999998899e-05,
64
+ "error_rate": 0.4667,
65
+ "savings": 0.9702,
66
+ "accuracy": 0.5333
67
+ }
68
+ }
qwen2.5-32b/consistent/qk_dh128/ood_aime25.json ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "0.01": {
3
+ "lambda": 0.9921,
4
+ "error_rate": 0.0,
5
+ "savings": 0.0028,
6
+ "accuracy": 1.0
7
+ },
8
+ "0.025": {
9
+ "lambda": 0.9767,
10
+ "error_rate": 0.0,
11
+ "savings": 0.0353,
12
+ "accuracy": 1.0
13
+ },
14
+ "0.05": {
15
+ "lambda": 0.9482,
16
+ "error_rate": 0.0,
17
+ "savings": 0.0536,
18
+ "accuracy": 1.0
19
+ },
20
+ "0.1": {
21
+ "lambda": 0.8952,
22
+ "error_rate": 0.0,
23
+ "savings": 0.1389,
24
+ "accuracy": 1.0
25
+ },
26
+ "0.15": {
27
+ "lambda": 0.8351,
28
+ "error_rate": 0.0333,
29
+ "savings": 0.2236,
30
+ "accuracy": 0.9667
31
+ },
32
+ "0.2": {
33
+ "lambda": 0.7674,
34
+ "error_rate": 0.1333,
35
+ "savings": 0.3198,
36
+ "accuracy": 0.8667
37
+ },
38
+ "0.25": {
39
+ "lambda": 0.6921999999999999,
40
+ "error_rate": 0.1667,
41
+ "savings": 0.4304,
42
+ "accuracy": 0.8333
43
+ },
44
+ "0.3": {
45
+ "lambda": 0.5946,
46
+ "error_rate": 0.2,
47
+ "savings": 0.5998,
48
+ "accuracy": 0.8
49
+ },
50
+ "0.35": {
51
+ "lambda": 0.4928,
52
+ "error_rate": 0.3333,
53
+ "savings": 0.7807,
54
+ "accuracy": 0.6667
55
+ },
56
+ "0.4": {
57
+ "lambda": 0.32909999999999995,
58
+ "error_rate": 0.5667,
59
+ "savings": 0.9402,
60
+ "accuracy": 0.4333
61
+ },
62
+ "0.5": {
63
+ "lambda": 9.999999999998899e-05,
64
+ "error_rate": 0.6,
65
+ "savings": 0.9647,
66
+ "accuracy": 0.4
67
+ }
68
+ }
qwen2.5-32b/consistent/qk_dh128/ood_aime26.json ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "0.01": {
3
+ "lambda": 0.9921,
4
+ "error_rate": 0.0,
5
+ "savings": 0.0,
6
+ "accuracy": 1.0
7
+ },
8
+ "0.025": {
9
+ "lambda": 0.9767,
10
+ "error_rate": 0.0,
11
+ "savings": 0.0252,
12
+ "accuracy": 1.0
13
+ },
14
+ "0.05": {
15
+ "lambda": 0.9482,
16
+ "error_rate": 0.0,
17
+ "savings": 0.055,
18
+ "accuracy": 1.0
19
+ },
20
+ "0.1": {
21
+ "lambda": 0.8952,
22
+ "error_rate": 0.0,
23
+ "savings": 0.0915,
24
+ "accuracy": 1.0
25
+ },
26
+ "0.15": {
27
+ "lambda": 0.8351,
28
+ "error_rate": 0.0333,
29
+ "savings": 0.2259,
30
+ "accuracy": 0.9667
31
+ },
32
+ "0.2": {
33
+ "lambda": 0.7674,
34
+ "error_rate": 0.1333,
35
+ "savings": 0.3766,
36
+ "accuracy": 0.8667
37
+ },
38
+ "0.25": {
39
+ "lambda": 0.6921999999999999,
40
+ "error_rate": 0.1667,
41
+ "savings": 0.4618,
42
+ "accuracy": 0.8333
43
+ },
44
+ "0.3": {
45
+ "lambda": 0.5946,
46
+ "error_rate": 0.2333,
47
+ "savings": 0.5934,
48
+ "accuracy": 0.7667
49
+ },
50
+ "0.35": {
51
+ "lambda": 0.4928,
52
+ "error_rate": 0.3333,
53
+ "savings": 0.7437,
54
+ "accuracy": 0.6667
55
+ },
56
+ "0.4": {
57
+ "lambda": 0.32909999999999995,
58
+ "error_rate": 0.4667,
59
+ "savings": 0.8902,
60
+ "accuracy": 0.5333
61
+ },
62
+ "0.5": {
63
+ "lambda": 9.999999999998899e-05,
64
+ "error_rate": 0.5333,
65
+ "savings": 0.9675,
66
+ "accuracy": 0.4667
67
+ }
68
+ }
qwen2.5-32b/consistent/qk_dh128/ood_gpqa_diamond.json ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "0.01": {
3
+ "lambda": 0.9921,
4
+ "error_rate": 0.0202,
5
+ "savings": 0.0274,
6
+ "accuracy": 0.9798
7
+ },
8
+ "0.025": {
9
+ "lambda": 0.9767,
10
+ "error_rate": 0.0758,
11
+ "savings": 0.1833,
12
+ "accuracy": 0.9242
13
+ },
14
+ "0.05": {
15
+ "lambda": 0.9482,
16
+ "error_rate": 0.202,
17
+ "savings": 0.3994,
18
+ "accuracy": 0.798
19
+ },
20
+ "0.1": {
21
+ "lambda": 0.8952,
22
+ "error_rate": 0.3283,
23
+ "savings": 0.6526,
24
+ "accuracy": 0.6717
25
+ },
26
+ "0.15": {
27
+ "lambda": 0.8351,
28
+ "error_rate": 0.3889,
29
+ "savings": 0.7731,
30
+ "accuracy": 0.6111
31
+ },
32
+ "0.2": {
33
+ "lambda": 0.7674,
34
+ "error_rate": 0.4444,
35
+ "savings": 0.8559,
36
+ "accuracy": 0.5556
37
+ },
38
+ "0.25": {
39
+ "lambda": 0.6921999999999999,
40
+ "error_rate": 0.4697,
41
+ "savings": 0.8948,
42
+ "accuracy": 0.5303
43
+ },
44
+ "0.3": {
45
+ "lambda": 0.5946,
46
+ "error_rate": 0.4949,
47
+ "savings": 0.9192,
48
+ "accuracy": 0.5051
49
+ },
50
+ "0.35": {
51
+ "lambda": 0.4928,
52
+ "error_rate": 0.5101,
53
+ "savings": 0.9511,
54
+ "accuracy": 0.4899
55
+ },
56
+ "0.4": {
57
+ "lambda": 0.32909999999999995,
58
+ "error_rate": 0.5101,
59
+ "savings": 0.9607,
60
+ "accuracy": 0.4899
61
+ },
62
+ "0.5": {
63
+ "lambda": 9.999999999998899e-05,
64
+ "error_rate": 0.5101,
65
+ "savings": 0.9614,
66
+ "accuracy": 0.4899
67
+ }
68
+ }
qwen2.5-32b/consistent/qk_dh128/ood_math500.json ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "0.01": {
3
+ "lambda": 0.9921,
4
+ "error_rate": 0.0,
5
+ "savings": 0.0768,
6
+ "accuracy": 1.0
7
+ },
8
+ "0.025": {
9
+ "lambda": 0.9767,
10
+ "error_rate": 0.002,
11
+ "savings": 0.27,
12
+ "accuracy": 0.998
13
+ },
14
+ "0.05": {
15
+ "lambda": 0.9482,
16
+ "error_rate": 0.008,
17
+ "savings": 0.4644,
18
+ "accuracy": 0.992
19
+ },
20
+ "0.1": {
21
+ "lambda": 0.8952,
22
+ "error_rate": 0.016,
23
+ "savings": 0.6371,
24
+ "accuracy": 0.984
25
+ },
26
+ "0.15": {
27
+ "lambda": 0.8351,
28
+ "error_rate": 0.022,
29
+ "savings": 0.7205,
30
+ "accuracy": 0.978
31
+ },
32
+ "0.2": {
33
+ "lambda": 0.7674,
34
+ "error_rate": 0.04,
35
+ "savings": 0.783,
36
+ "accuracy": 0.96
37
+ },
38
+ "0.25": {
39
+ "lambda": 0.6921999999999999,
40
+ "error_rate": 0.058,
41
+ "savings": 0.823,
42
+ "accuracy": 0.942
43
+ },
44
+ "0.3": {
45
+ "lambda": 0.5946,
46
+ "error_rate": 0.072,
47
+ "savings": 0.8578,
48
+ "accuracy": 0.928
49
+ },
50
+ "0.35": {
51
+ "lambda": 0.4928,
52
+ "error_rate": 0.086,
53
+ "savings": 0.8758,
54
+ "accuracy": 0.914
55
+ },
56
+ "0.4": {
57
+ "lambda": 0.32909999999999995,
58
+ "error_rate": 0.098,
59
+ "savings": 0.8893,
60
+ "accuracy": 0.902
61
+ },
62
+ "0.5": {
63
+ "lambda": 9.999999999998899e-05,
64
+ "error_rate": 0.1,
65
+ "savings": 0.8907,
66
+ "accuracy": 0.9
67
+ }
68
+ }
qwen2.5-32b/consistent/qk_dh128/probe.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4d40434084b7f3190e0f96816cdecf4d243a3567f59100f78fc34f1ca07b6242
3
+ size 5246202
qwen2.5-32b/supervised/no_kq/config.json ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "config": "configs/qwen32b_5k.yaml",
3
+ "method": "ttt",
4
+ "dataset_path": [
5
+ "data_prepare/output/qwen32b/s1k/dataset.pkl",
6
+ "data_prepare/output/qwen32b/openr1_2k/dataset.pkl",
7
+ "data_prepare/output/qwen32b/deepmath_2k/dataset.pkl"
8
+ ],
9
+ "ood_paths": [
10
+ "data_prepare/output/qwen32b/aime24/dataset.pkl",
11
+ "data_prepare/output/qwen32b/aime25/dataset.pkl",
12
+ "data_prepare/output/qwen32b/aime26/dataset.pkl",
13
+ "data_prepare/output/qwen32b/math500/dataset.pkl",
14
+ "data_prepare/output/qwen32b/gpqa_diamond/dataset.pkl"
15
+ ],
16
+ "output_dir": "results/qwen32b_5k",
17
+ "label_mode": "supervised",
18
+ "batch_size": 10,
19
+ "seed": 42,
20
+ "smooth_window": 10,
21
+ "run_name": "ttt__no_kq__lr0.01",
22
+ "d_hidden": 64,
23
+ "use_ln": false,
24
+ "use_residual": false,
25
+ "learnable_eta": false,
26
+ "base_lr": 0.01,
27
+ "share_kq": false,
28
+ "use_mlp": false,
29
+ "use_pca": false,
30
+ "pca_dim": 256,
31
+ "epochs": 20,
32
+ "outer_lr": 0.001,
33
+ "no_meta_train": false,
34
+ "no_online_update": false,
35
+ "no_kq": true,
36
+ "grad_clip": 1.0,
37
+ "force_retrain": false,
38
+ "d_phi": 5120,
39
+ "timestamp": "2026-03-27T19:49:22.309058",
40
+ "release_target": "qwen2.5-32b/supervised/no_kq",
41
+ "release_probe_source": "qwen32b_5k/supervised/ttt__no_kq__lr0.01/checkpoints/probe_ep20.pt"
42
+ }
qwen2.5-32b/supervised/no_kq/lambdas.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "0.01": 0.9489,
3
+ "0.025": 0.9215,
4
+ "0.05": 0.8896,
5
+ "0.1": 0.8326,
6
+ "0.15": 0.7989999999999999,
7
+ "0.2": 0.7598,
8
+ "0.25": 0.7142999999999999,
9
+ "0.3": 0.6740999999999999,
10
+ "0.35": 0.6171,
11
+ "0.4": 0.5069,
12
+ "0.5": 9.999999999998899e-05
13
+ }
qwen2.5-32b/supervised/no_kq/metrics.json ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "eps_results": {
3
+ "0.01": {
4
+ "lambda": 0.9489,
5
+ "error_rate": 0.01,
6
+ "savings": 0.0372,
7
+ "accuracy": 0.99
8
+ },
9
+ "0.025": {
10
+ "lambda": 0.9215,
11
+ "error_rate": 0.0266,
12
+ "savings": 0.1437,
13
+ "accuracy": 0.9734
14
+ },
15
+ "0.05": {
16
+ "lambda": 0.8896,
17
+ "error_rate": 0.0532,
18
+ "savings": 0.2817,
19
+ "accuracy": 0.9468
20
+ },
21
+ "0.1": {
22
+ "lambda": 0.8326,
23
+ "error_rate": 0.1098,
24
+ "savings": 0.4746,
25
+ "accuracy": 0.8902
26
+ },
27
+ "0.15": {
28
+ "lambda": 0.7989999999999999,
29
+ "error_rate": 0.1519,
30
+ "savings": 0.5749,
31
+ "accuracy": 0.8481
32
+ },
33
+ "0.2": {
34
+ "lambda": 0.7598,
35
+ "error_rate": 0.1918,
36
+ "savings": 0.6731,
37
+ "accuracy": 0.8082
38
+ },
39
+ "0.25": {
40
+ "lambda": 0.7142999999999999,
41
+ "error_rate": 0.2583,
42
+ "savings": 0.76,
43
+ "accuracy": 0.7417
44
+ },
45
+ "0.3": {
46
+ "lambda": 0.6740999999999999,
47
+ "error_rate": 0.2982,
48
+ "savings": 0.8183,
49
+ "accuracy": 0.7018
50
+ },
51
+ "0.35": {
52
+ "lambda": 0.6171,
53
+ "error_rate": 0.3514,
54
+ "savings": 0.8793,
55
+ "accuracy": 0.6486
56
+ },
57
+ "0.4": {
58
+ "lambda": 0.5069,
59
+ "error_rate": 0.388,
60
+ "savings": 0.9365,
61
+ "accuracy": 0.612
62
+ },
63
+ "0.5": {
64
+ "lambda": 9.999999999998899e-05,
65
+ "error_rate": 0.3947,
66
+ "savings": 0.9502,
67
+ "accuracy": 0.6053
68
+ }
69
+ }
70
+ }
qwen2.5-32b/supervised/no_kq/ood_aime24.json ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "0.01": {
3
+ "lambda": 0.9489,
4
+ "error_rate": 0.0,
5
+ "savings": 0.007,
6
+ "accuracy": 1.0
7
+ },
8
+ "0.025": {
9
+ "lambda": 0.9215,
10
+ "error_rate": 0.0,
11
+ "savings": 0.0411,
12
+ "accuracy": 1.0
13
+ },
14
+ "0.05": {
15
+ "lambda": 0.8896,
16
+ "error_rate": 0.0,
17
+ "savings": 0.0837,
18
+ "accuracy": 1.0
19
+ },
20
+ "0.1": {
21
+ "lambda": 0.8326,
22
+ "error_rate": 0.15,
23
+ "savings": 0.2932,
24
+ "accuracy": 0.85
25
+ },
26
+ "0.15": {
27
+ "lambda": 0.7989999999999999,
28
+ "error_rate": 0.2,
29
+ "savings": 0.4065,
30
+ "accuracy": 0.8
31
+ },
32
+ "0.2": {
33
+ "lambda": 0.7598,
34
+ "error_rate": 0.25,
35
+ "savings": 0.4869,
36
+ "accuracy": 0.75
37
+ },
38
+ "0.25": {
39
+ "lambda": 0.7142999999999999,
40
+ "error_rate": 0.25,
41
+ "savings": 0.5858,
42
+ "accuracy": 0.75
43
+ },
44
+ "0.3": {
45
+ "lambda": 0.6740999999999999,
46
+ "error_rate": 0.3,
47
+ "savings": 0.666,
48
+ "accuracy": 0.7
49
+ },
50
+ "0.35": {
51
+ "lambda": 0.6171,
52
+ "error_rate": 0.35,
53
+ "savings": 0.7817,
54
+ "accuracy": 0.65
55
+ },
56
+ "0.4": {
57
+ "lambda": 0.5069,
58
+ "error_rate": 0.55,
59
+ "savings": 0.96,
60
+ "accuracy": 0.45
61
+ },
62
+ "0.5": {
63
+ "lambda": 9.999999999998899e-05,
64
+ "error_rate": 0.55,
65
+ "savings": 0.9683,
66
+ "accuracy": 0.45
67
+ }
68
+ }
qwen2.5-32b/supervised/no_kq/ood_aime25.json ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "0.01": {
3
+ "lambda": 0.9489,
4
+ "error_rate": 0.0,
5
+ "savings": 0.0,
6
+ "accuracy": 1.0
7
+ },
8
+ "0.025": {
9
+ "lambda": 0.9215,
10
+ "error_rate": 0.0,
11
+ "savings": 0.0281,
12
+ "accuracy": 1.0
13
+ },
14
+ "0.05": {
15
+ "lambda": 0.8896,
16
+ "error_rate": 0.0,
17
+ "savings": 0.0455,
18
+ "accuracy": 1.0
19
+ },
20
+ "0.1": {
21
+ "lambda": 0.8326,
22
+ "error_rate": 0.0556,
23
+ "savings": 0.265,
24
+ "accuracy": 0.9444
25
+ },
26
+ "0.15": {
27
+ "lambda": 0.7989999999999999,
28
+ "error_rate": 0.0556,
29
+ "savings": 0.3621,
30
+ "accuracy": 0.9444
31
+ },
32
+ "0.2": {
33
+ "lambda": 0.7598,
34
+ "error_rate": 0.1111,
35
+ "savings": 0.5146,
36
+ "accuracy": 0.8889
37
+ },
38
+ "0.25": {
39
+ "lambda": 0.7142999999999999,
40
+ "error_rate": 0.1667,
41
+ "savings": 0.6929,
42
+ "accuracy": 0.8333
43
+ },
44
+ "0.3": {
45
+ "lambda": 0.6740999999999999,
46
+ "error_rate": 0.3333,
47
+ "savings": 0.7742,
48
+ "accuracy": 0.6667
49
+ },
50
+ "0.35": {
51
+ "lambda": 0.6171,
52
+ "error_rate": 0.3333,
53
+ "savings": 0.8174,
54
+ "accuracy": 0.6667
55
+ },
56
+ "0.4": {
57
+ "lambda": 0.5069,
58
+ "error_rate": 0.4444,
59
+ "savings": 0.9417,
60
+ "accuracy": 0.5556
61
+ },
62
+ "0.5": {
63
+ "lambda": 9.999999999998899e-05,
64
+ "error_rate": 0.4444,
65
+ "savings": 0.9529,
66
+ "accuracy": 0.5556
67
+ }
68
+ }
qwen2.5-32b/supervised/no_kq/ood_aime26.json ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "0.01": {
3
+ "lambda": 0.9489,
4
+ "error_rate": 0.0,
5
+ "savings": 0.0239,
6
+ "accuracy": 1.0
7
+ },
8
+ "0.025": {
9
+ "lambda": 0.9215,
10
+ "error_rate": 0.0,
11
+ "savings": 0.0305,
12
+ "accuracy": 1.0
13
+ },
14
+ "0.05": {
15
+ "lambda": 0.8896,
16
+ "error_rate": 0.0,
17
+ "savings": 0.0744,
18
+ "accuracy": 1.0
19
+ },
20
+ "0.1": {
21
+ "lambda": 0.8326,
22
+ "error_rate": 0.05,
23
+ "savings": 0.1979,
24
+ "accuracy": 0.95
25
+ },
26
+ "0.15": {
27
+ "lambda": 0.7989999999999999,
28
+ "error_rate": 0.15,
29
+ "savings": 0.3098,
30
+ "accuracy": 0.85
31
+ },
32
+ "0.2": {
33
+ "lambda": 0.7598,
34
+ "error_rate": 0.3,
35
+ "savings": 0.5139,
36
+ "accuracy": 0.7
37
+ },
38
+ "0.25": {
39
+ "lambda": 0.7142999999999999,
40
+ "error_rate": 0.35,
41
+ "savings": 0.6549,
42
+ "accuracy": 0.65
43
+ },
44
+ "0.3": {
45
+ "lambda": 0.6740999999999999,
46
+ "error_rate": 0.35,
47
+ "savings": 0.7077,
48
+ "accuracy": 0.65
49
+ },
50
+ "0.35": {
51
+ "lambda": 0.6171,
52
+ "error_rate": 0.4,
53
+ "savings": 0.7691,
54
+ "accuracy": 0.6
55
+ },
56
+ "0.4": {
57
+ "lambda": 0.5069,
58
+ "error_rate": 0.45,
59
+ "savings": 0.9326,
60
+ "accuracy": 0.55
61
+ },
62
+ "0.5": {
63
+ "lambda": 9.999999999998899e-05,
64
+ "error_rate": 0.55,
65
+ "savings": 0.9591,
66
+ "accuracy": 0.45
67
+ }
68
+ }
qwen2.5-32b/supervised/no_kq/ood_gpqa_diamond.json ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "0.01": {
3
+ "lambda": 0.9489,
4
+ "error_rate": 0.04,
5
+ "savings": 0.1684,
6
+ "accuracy": 0.96
7
+ },
8
+ "0.025": {
9
+ "lambda": 0.9215,
10
+ "error_rate": 0.13,
11
+ "savings": 0.3363,
12
+ "accuracy": 0.87
13
+ },
14
+ "0.05": {
15
+ "lambda": 0.8896,
16
+ "error_rate": 0.21,
17
+ "savings": 0.5039,
18
+ "accuracy": 0.79
19
+ },
20
+ "0.1": {
21
+ "lambda": 0.8326,
22
+ "error_rate": 0.3,
23
+ "savings": 0.7154,
24
+ "accuracy": 0.7
25
+ },
26
+ "0.15": {
27
+ "lambda": 0.7989999999999999,
28
+ "error_rate": 0.34,
29
+ "savings": 0.8213,
30
+ "accuracy": 0.66
31
+ },
32
+ "0.2": {
33
+ "lambda": 0.7598,
34
+ "error_rate": 0.39,
35
+ "savings": 0.8965,
36
+ "accuracy": 0.61
37
+ },
38
+ "0.25": {
39
+ "lambda": 0.7142999999999999,
40
+ "error_rate": 0.41,
41
+ "savings": 0.9342,
42
+ "accuracy": 0.59
43
+ },
44
+ "0.3": {
45
+ "lambda": 0.6740999999999999,
46
+ "error_rate": 0.41,
47
+ "savings": 0.9494,
48
+ "accuracy": 0.59
49
+ },
50
+ "0.35": {
51
+ "lambda": 0.6171,
52
+ "error_rate": 0.41,
53
+ "savings": 0.9566,
54
+ "accuracy": 0.59
55
+ },
56
+ "0.4": {
57
+ "lambda": 0.5069,
58
+ "error_rate": 0.41,
59
+ "savings": 0.9567,
60
+ "accuracy": 0.59
61
+ },
62
+ "0.5": {
63
+ "lambda": 9.999999999998899e-05,
64
+ "error_rate": 0.41,
65
+ "savings": 0.9567,
66
+ "accuracy": 0.59
67
+ }
68
+ }
qwen2.5-32b/supervised/no_kq/ood_math500.json ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "0.01": {
3
+ "lambda": 0.9489,
4
+ "error_rate": 0.0,
5
+ "savings": 0.0623,
6
+ "accuracy": 1.0
7
+ },
8
+ "0.025": {
9
+ "lambda": 0.9215,
10
+ "error_rate": 0.0,
11
+ "savings": 0.2042,
12
+ "accuracy": 1.0
13
+ },
14
+ "0.05": {
15
+ "lambda": 0.8896,
16
+ "error_rate": 0.0062,
17
+ "savings": 0.3908,
18
+ "accuracy": 0.9938
19
+ },
20
+ "0.1": {
21
+ "lambda": 0.8326,
22
+ "error_rate": 0.0227,
23
+ "savings": 0.637,
24
+ "accuracy": 0.9773
25
+ },
26
+ "0.15": {
27
+ "lambda": 0.7989999999999999,
28
+ "error_rate": 0.033,
29
+ "savings": 0.7208,
30
+ "accuracy": 0.967
31
+ },
32
+ "0.2": {
33
+ "lambda": 0.7598,
34
+ "error_rate": 0.0495,
35
+ "savings": 0.7815,
36
+ "accuracy": 0.9505
37
+ },
38
+ "0.25": {
39
+ "lambda": 0.7142999999999999,
40
+ "error_rate": 0.066,
41
+ "savings": 0.8267,
42
+ "accuracy": 0.934
43
+ },
44
+ "0.3": {
45
+ "lambda": 0.6740999999999999,
46
+ "error_rate": 0.068,
47
+ "savings": 0.8473,
48
+ "accuracy": 0.932
49
+ },
50
+ "0.35": {
51
+ "lambda": 0.6171,
52
+ "error_rate": 0.0866,
53
+ "savings": 0.8708,
54
+ "accuracy": 0.9134
55
+ },
56
+ "0.4": {
57
+ "lambda": 0.5069,
58
+ "error_rate": 0.0907,
59
+ "savings": 0.8823,
60
+ "accuracy": 0.9093
61
+ },
62
+ "0.5": {
63
+ "lambda": 9.999999999998899e-05,
64
+ "error_rate": 0.0948,
65
+ "savings": 0.8885,
66
+ "accuracy": 0.9052
67
+ }
68
+ }
qwen2.5-32b/supervised/no_kq/probe.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6ce9b16ed9382dc67d63db1ceecbbe64f512f3ce7d152313a7cc60385bb1a385
3
+ size 22652
qwen2.5-32b/supervised/qk_dh128/config.json ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "config": "configs/qwen32b_5k.yaml",
3
+ "method": "ttt",
4
+ "dataset_path": [
5
+ "data_prepare/output/qwen32b/s1k/dataset.pkl",
6
+ "data_prepare/output/qwen32b/openr1_2k/dataset.pkl",
7
+ "data_prepare/output/qwen32b/deepmath_2k/dataset.pkl"
8
+ ],
9
+ "ood_paths": [
10
+ "data_prepare/output/qwen32b/aime24/dataset.pkl",
11
+ "data_prepare/output/qwen32b/aime25/dataset.pkl",
12
+ "data_prepare/output/qwen32b/aime26/dataset.pkl",
13
+ "data_prepare/output/qwen32b/math500/dataset.pkl",
14
+ "data_prepare/output/qwen32b/gpqa_diamond/dataset.pkl"
15
+ ],
16
+ "output_dir": "results/qwen32b_5k",
17
+ "label_mode": "supervised",
18
+ "batch_size": 10,
19
+ "seed": 42,
20
+ "smooth_window": 10,
21
+ "run_name": "ttt__dh128__lr0.01",
22
+ "d_hidden": 128,
23
+ "use_ln": false,
24
+ "use_residual": false,
25
+ "learnable_eta": false,
26
+ "base_lr": 0.01,
27
+ "share_kq": false,
28
+ "use_mlp": false,
29
+ "use_pca": false,
30
+ "pca_dim": 256,
31
+ "epochs": 10,
32
+ "outer_lr": 0.001,
33
+ "no_meta_train": false,
34
+ "no_online_update": false,
35
+ "no_kq": false,
36
+ "grad_clip": 1.0,
37
+ "force_retrain": false,
38
+ "d_phi": 5120,
39
+ "timestamp": "2026-03-28T00:26:53.748545",
40
+ "release_target": "qwen2.5-32b/supervised/qk_dh128",
41
+ "release_probe_source": "qwen32b_5k/supervised/ttt__dh128__lr0.01/checkpoints/probe_ep10.pt"
42
+ }
qwen2.5-32b/supervised/qk_dh128/lambdas.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "0.01": 0.9929,
3
+ "0.025": 0.987,
4
+ "0.05": 0.9749,
5
+ "0.1": 0.9419,
6
+ "0.15": 0.9018,
7
+ "0.2": 0.8491,
8
+ "0.25": 0.7923,
9
+ "0.3": 0.7335,
10
+ "0.35": 0.6254,
11
+ "0.4": 0.39059999999999995,
12
+ "0.5": 9.999999999998899e-05
13
+ }
qwen2.5-32b/supervised/qk_dh128/metrics.json ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "eps_results": {
3
+ "0.01": {
4
+ "lambda": 0.9929,
5
+ "error_rate": 0.01,
6
+ "savings": 0.0466,
7
+ "accuracy": 0.99
8
+ },
9
+ "0.025": {
10
+ "lambda": 0.987,
11
+ "error_rate": 0.0211,
12
+ "savings": 0.1107,
13
+ "accuracy": 0.9789
14
+ },
15
+ "0.05": {
16
+ "lambda": 0.9749,
17
+ "error_rate": 0.0455,
18
+ "savings": 0.2332,
19
+ "accuracy": 0.9545
20
+ },
21
+ "0.1": {
22
+ "lambda": 0.9419,
23
+ "error_rate": 0.1031,
24
+ "savings": 0.4141,
25
+ "accuracy": 0.8969
26
+ },
27
+ "0.15": {
28
+ "lambda": 0.9018,
29
+ "error_rate": 0.1497,
30
+ "savings": 0.5596,
31
+ "accuracy": 0.8503
32
+ },
33
+ "0.2": {
34
+ "lambda": 0.8491,
35
+ "error_rate": 0.204,
36
+ "savings": 0.674,
37
+ "accuracy": 0.796
38
+ },
39
+ "0.25": {
40
+ "lambda": 0.7923,
41
+ "error_rate": 0.2506,
42
+ "savings": 0.7552,
43
+ "accuracy": 0.7494
44
+ },
45
+ "0.3": {
46
+ "lambda": 0.7335,
47
+ "error_rate": 0.2905,
48
+ "savings": 0.8134,
49
+ "accuracy": 0.7095
50
+ },
51
+ "0.35": {
52
+ "lambda": 0.6254,
53
+ "error_rate": 0.3437,
54
+ "savings": 0.8837,
55
+ "accuracy": 0.6563
56
+ },
57
+ "0.4": {
58
+ "lambda": 0.39059999999999995,
59
+ "error_rate": 0.3902,
60
+ "savings": 0.9407,
61
+ "accuracy": 0.6098
62
+ },
63
+ "0.5": {
64
+ "lambda": 9.999999999998899e-05,
65
+ "error_rate": 0.3947,
66
+ "savings": 0.9502,
67
+ "accuracy": 0.6053
68
+ }
69
+ }
70
+ }
qwen2.5-32b/supervised/qk_dh128/ood_aime24.json ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "0.01": {
3
+ "lambda": 0.9929,
4
+ "error_rate": 0.0,
5
+ "savings": 0.0527,
6
+ "accuracy": 1.0
7
+ },
8
+ "0.025": {
9
+ "lambda": 0.987,
10
+ "error_rate": 0.05,
11
+ "savings": 0.1005,
12
+ "accuracy": 0.95
13
+ },
14
+ "0.05": {
15
+ "lambda": 0.9749,
16
+ "error_rate": 0.05,
17
+ "savings": 0.1472,
18
+ "accuracy": 0.95
19
+ },
20
+ "0.1": {
21
+ "lambda": 0.9419,
22
+ "error_rate": 0.1,
23
+ "savings": 0.2949,
24
+ "accuracy": 0.9
25
+ },
26
+ "0.15": {
27
+ "lambda": 0.9018,
28
+ "error_rate": 0.15,
29
+ "savings": 0.4545,
30
+ "accuracy": 0.85
31
+ },
32
+ "0.2": {
33
+ "lambda": 0.8491,
34
+ "error_rate": 0.2,
35
+ "savings": 0.5534,
36
+ "accuracy": 0.8
37
+ },
38
+ "0.25": {
39
+ "lambda": 0.7923,
40
+ "error_rate": 0.25,
41
+ "savings": 0.6954,
42
+ "accuracy": 0.75
43
+ },
44
+ "0.3": {
45
+ "lambda": 0.7335,
46
+ "error_rate": 0.35,
47
+ "savings": 0.7598,
48
+ "accuracy": 0.65
49
+ },
50
+ "0.35": {
51
+ "lambda": 0.6254,
52
+ "error_rate": 0.45,
53
+ "savings": 0.8599,
54
+ "accuracy": 0.55
55
+ },
56
+ "0.4": {
57
+ "lambda": 0.39059999999999995,
58
+ "error_rate": 0.55,
59
+ "savings": 0.9566,
60
+ "accuracy": 0.45
61
+ },
62
+ "0.5": {
63
+ "lambda": 9.999999999998899e-05,
64
+ "error_rate": 0.55,
65
+ "savings": 0.9683,
66
+ "accuracy": 0.45
67
+ }
68
+ }