Create svd_triton_gram_newton_profiled.txt
Browse files
svd_triton_gram_newton_profiled.txt
ADDED
|
@@ -0,0 +1,221 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
================================================================================
|
| 2 |
+
Generalized Batched Thin SVD β Profiling Suite
|
| 3 |
+
Device: NVIDIA RTX PRO 6000 Blackwell Server Edition
|
| 4 |
+
================================================================================
|
| 5 |
+
|
| 6 |
+
======================================================================
|
| 7 |
+
CORRECTNESS VALIDATION (B=64, M=1024)
|
| 8 |
+
======================================================================
|
| 9 |
+
[auto] N= 2: S_err=1.91e-05 recon=9.54e-07 (ref=4.83e-06) orth=1.43e-06 desc=True [PASS]
|
| 10 |
+
[triton] N= 2: S_err=1.91e-05 recon=9.54e-07 (ref=4.83e-06) orth=1.43e-06 desc=True [PASS]
|
| 11 |
+
[auto] N= 3: S_err=4.01e-05 recon=2.38e-06 (ref=8.34e-06) orth=1.13e-06 desc=True [PASS]
|
| 12 |
+
[triton] N= 3: S_err=4.01e-05 recon=2.38e-06 (ref=8.34e-06) orth=1.13e-06 desc=True [PASS]
|
| 13 |
+
[auto] N= 4: S_err=4.01e-05 recon=2.38e-06 (ref=9.06e-06) orth=1.73e-06 desc=True [PASS]
|
| 14 |
+
[gram] N= 4: S_err=4.01e-05 recon=2.38e-06 (ref=9.06e-06) orth=1.73e-06 desc=True [PASS]
|
| 15 |
+
[auto] N= 5: S_err=5.15e-05 recon=3.81e-06 (ref=9.30e-06) orth=1.79e-06 desc=True [PASS]
|
| 16 |
+
[gram] N= 5: S_err=5.15e-05 recon=3.81e-06 (ref=9.30e-06) orth=1.79e-06 desc=True [PASS]
|
| 17 |
+
[auto] N= 6: S_err=6.29e-05 recon=2.86e-06 (ref=1.24e-05) orth=1.67e-06 desc=True [PASS]
|
| 18 |
+
[gram] N= 6: S_err=6.29e-05 recon=2.86e-06 (ref=1.24e-05) orth=1.67e-06 desc=True [PASS]
|
| 19 |
+
[auto] N= 8: S_err=9.54e-05 recon=3.58e-06 (ref=1.50e-05) orth=1.67e-06 desc=True [PASS]
|
| 20 |
+
[gram] N= 8: S_err=9.54e-05 recon=3.58e-06 (ref=1.50e-05) orth=1.67e-06 desc=True [PASS]
|
| 21 |
+
[newton] N= 8: S_err=9.54e-05 recon=3.58e-06 (ref=1.50e-05) orth=1.67e-06 desc=True [PASS]
|
| 22 |
+
[auto] N= 10: S_err=8.39e-05 recon=4.05e-06 (ref=1.41e-05) orth=1.67e-06 desc=True [PASS]
|
| 23 |
+
[gram] N= 10: S_err=8.39e-05 recon=4.05e-06 (ref=1.41e-05) orth=1.67e-06 desc=True [PASS]
|
| 24 |
+
[newton] N= 10: S_err=8.39e-05 recon=4.05e-06 (ref=1.41e-05) orth=1.67e-06 desc=True [PASS]
|
| 25 |
+
[auto] N= 16: S_err=1.41e-04 recon=4.29e-06 (ref=2.57e-05) orth=1.91e-06 desc=True [PASS]
|
| 26 |
+
[gram] N= 16: S_err=1.41e-04 recon=4.29e-06 (ref=2.57e-05) orth=1.91e-06 desc=True [PASS]
|
| 27 |
+
[newton] N= 16: S_err=1.41e-04 recon=4.29e-06 (ref=2.57e-05) orth=1.91e-06 desc=True [PASS]
|
| 28 |
+
[auto] N= 32: S_err=1.79e-04 recon=3.67e-06 (ref=3.17e-05) orth=2.03e-06 desc=True [PASS]
|
| 29 |
+
[gram] N= 32: S_err=1.79e-04 recon=3.67e-06 (ref=3.17e-05) orth=2.03e-06 desc=True [PASS]
|
| 30 |
+
[newton] N= 32: S_err=1.79e-04 recon=3.67e-06 (ref=3.17e-05) orth=2.03e-06 desc=True [PASS]
|
| 31 |
+
[auto] N= 48: S_err=3.05e-04 recon=4.24e-05 (ref=4.74e-05) orth=4.46e-06 desc=True [PASS]
|
| 32 |
+
[gram] N= 48: S_err=3.05e-04 recon=4.24e-05 (ref=4.74e-05) orth=4.46e-06 desc=True [PASS]
|
| 33 |
+
[newton] N= 48: S_err=3.05e-04 recon=4.24e-05 (ref=4.74e-05) orth=4.46e-06 desc=True [PASS]
|
| 34 |
+
[auto] N= 64: S_err=4.27e-04 recon=5.72e-05 (ref=6.32e-05) orth=5.24e-06 desc=True [PASS]
|
| 35 |
+
[gram] N= 64: S_err=4.27e-04 recon=5.72e-05 (ref=6.32e-05) orth=5.24e-06 desc=True [PASS]
|
| 36 |
+
[newton] N= 64: S_err=4.27e-04 recon=5.72e-05 (ref=6.32e-05) orth=5.24e-06 desc=True [PASS]
|
| 37 |
+
[auto] N= 96: S_err=1.17e-03 recon=1.07e-04 (ref=9.39e-05) orth=2.74e-06 desc=True [PASS]
|
| 38 |
+
[gram] N= 96: S_err=1.17e-03 recon=1.07e-04 (ref=9.39e-05) orth=2.74e-06 desc=True [PASS]
|
| 39 |
+
[newton] N= 96: S_err=1.17e-03 recon=1.07e-04 (ref=9.39e-05) orth=2.74e-06 desc=True [PASS]
|
| 40 |
+
[auto] N=128: S_err=1.42e-03 recon=1.63e-04 (ref=1.27e-04) orth=4.17e-06 desc=True [PASS]
|
| 41 |
+
[gram] N=128: S_err=1.42e-03 recon=1.63e-04 (ref=1.27e-04) orth=4.17e-06 desc=True [PASS]
|
| 42 |
+
[newton] N=128: S_err=1.42e-03 recon=1.63e-04 (ref=1.27e-04) orth=4.17e-06 desc=True [PASS]
|
| 43 |
+
|
| 44 |
+
ALL PASSED
|
| 45 |
+
|
| 46 |
+
========================================================================================================================
|
| 47 |
+
PROCRUSTES ALIGNMENT: 5 methods of applying rank-k rotation to N-d space
|
| 48 |
+
cos = mean cosine similarity after alignment (higher = better, full = ceiling)
|
| 49 |
+
NN = nearest-neighbor agreement with full Procrustes (1.0 = identical downstream)
|
| 50 |
+
========================================================================================================================
|
| 51 |
+
|
| 52 |
+
N=32:
|
| 53 |
+
k full pinv lerp (Ξ±) slerp (Ξ±) subspc stay_k β nn_pv nn_lr nn_sl nn_ss
|
| 54 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 55 |
+
8 0.4359 0.2142 0.4248 0.3 0.2142 err 0.4299 0.4215 β 0.177 0.681 0.177 1.000
|
| 56 |
+
16 0.4370 0.2967 0.4259 0.3 0.2967 err 0.4316 0.4252 β 0.300 0.678 0.300 1.000
|
| 57 |
+
24 0.4405 0.3864 0.4365 0.3 0.3864 err 0.4369 0.4384 β 0.555 0.772 0.555 1.000
|
| 58 |
+
|
| 59 |
+
N=48:
|
| 60 |
+
k full pinv lerp (Ξ±) slerp (Ξ±) subspc stay_k β nn_pv nn_lr nn_sl nn_ss
|
| 61 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 62 |
+
8 0.4421 0.1764 0.4306 0.3 0.1764 err 0.4350 0.4192 β 0.102 0.702 0.102 1.000
|
| 63 |
+
16 0.4422 0.2494 0.4290 0.3 0.2494 err 0.4354 0.4292 β 0.230 0.667 0.230 1.000
|
| 64 |
+
24 0.4432 0.3047 0.4294 0.3 0.3047 err 0.4366 0.4315 β 0.326 0.676 0.326 1.000
|
| 65 |
+
32 0.4476 0.3621 0.4397 0.3 0.3621 err 0.4429 0.4425 β 0.454 0.728 0.454 1.000
|
| 66 |
+
|
| 67 |
+
N=64:
|
| 68 |
+
k full pinv lerp (Ξ±) slerp (Ξ±) subspc stay_k β nn_pv nn_lr nn_sl nn_ss
|
| 69 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 70 |
+
8 0.4475 0.1602 0.4356 0.3 0.1602 err 0.4390 0.4323 β 0.102 0.708 0.102 1.000
|
| 71 |
+
16 0.4444 0.2178 0.4300 0.3 0.2178 err 0.4355 0.4299 β 0.164 0.658 0.164 1.000
|
| 72 |
+
24 0.4453 0.2678 0.4295 0.3 0.2678 err 0.4363 0.4332 β 0.241 0.665 0.241 1.000
|
| 73 |
+
32 0.4468 0.3091 0.4324 0.3 0.3091 err 0.4390 0.4374 β 0.312 0.680 0.312 1.000
|
| 74 |
+
|
| 75 |
+
N=96:
|
| 76 |
+
k full pinv lerp (Ξ±) slerp (Ξ±) subspc stay_k β nn_pv nn_lr nn_sl nn_ss
|
| 77 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 78 |
+
16 0.4267 0.1644 0.4035 0.3 0.1644 err 0.4077 0.4020 β 0.132 0.721 0.132 1.000
|
| 79 |
+
24 0.4259 0.2023 0.4014 0.3 0.2023 err 0.4069 0.4034 β 0.200 0.709 0.200 1.000
|
| 80 |
+
32 0.4241 0.2363 0.3996 0.3 0.2363 err 0.4057 0.4056 β 0.241 0.688 0.241 1.000
|
| 81 |
+
48 0.4238 0.2978 0.4050 0.3 0.2978 err 0.4080 0.4139 β 0.394 0.717 0.394 1.000
|
| 82 |
+
|
| 83 |
+
N=128:
|
| 84 |
+
k full pinv lerp (Ξ±) slerp (Ξ±) subspc stay_k β nn_pv nn_lr nn_sl nn_ss
|
| 85 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 86 |
+
16 0.4068 0.1380 0.3740 0.3 0.1380 err 0.3770 0.3763 β 0.129 0.757 0.129 1.000
|
| 87 |
+
24 0.4072 0.1679 0.3733 0.3 0.1679 err 0.3774 0.3778 β 0.169 0.739 0.169 1.000
|
| 88 |
+
32 0.4064 0.1860 0.3730 0.3 0.1860 err 0.3778 0.3736 β 0.217 0.723 0.217 1.000
|
| 89 |
+
48 0.4073 0.2397 0.3783 0.3 0.2397 err 0.3812 0.3868 β 0.310 0.733 0.310 1.000
|
| 90 |
+
64 0.4102 0.2781 0.3853 0.3 0.2781 err 0.3880 0.3937 β 0.394 0.729 0.394 1.000
|
| 91 |
+
|
| 92 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 93 |
+
WINNER PER CONFIG (closest cos to full, highest NN agreement):
|
| 94 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 95 |
+
N= 32 k= 8: best_cos=subspace (0.4299, gap=0.0060) best_nn=subspace (1.000)
|
| 96 |
+
N= 32 k= 16: best_cos=subspace (0.4316, gap=0.0054) best_nn=subspace (1.000)
|
| 97 |
+
N= 32 k= 24: best_cos=subspace (0.4369, gap=0.0037) best_nn=subspace (1.000)
|
| 98 |
+
N= 48 k= 8: best_cos=subspace (0.4350, gap=0.0071) best_nn=subspace (1.000)
|
| 99 |
+
N= 48 k= 16: best_cos=subspace (0.4354, gap=0.0068) best_nn=subspace (1.000)
|
| 100 |
+
N= 48 k= 24: best_cos=subspace (0.4366, gap=0.0066) best_nn=subspace (1.000)
|
| 101 |
+
N= 48 k= 32: best_cos=subspace (0.4429, gap=0.0047) best_nn=subspace (1.000)
|
| 102 |
+
N= 64 k= 8: best_cos=subspace (0.4390, gap=0.0085) best_nn=subspace (1.000)
|
| 103 |
+
N= 64 k= 16: best_cos=subspace (0.4355, gap=0.0089) best_nn=subspace (1.000)
|
| 104 |
+
N= 64 k= 24: best_cos=subspace (0.4363, gap=0.0090) best_nn=subspace (1.000)
|
| 105 |
+
N= 64 k= 32: best_cos=subspace (0.4390, gap=0.0078) best_nn=subspace (1.000)
|
| 106 |
+
N= 96 k= 16: best_cos=subspace (0.4077, gap=0.0190) best_nn=subspace (1.000)
|
| 107 |
+
N= 96 k= 24: best_cos=subspace (0.4069, gap=0.0190) best_nn=subspace (1.000)
|
| 108 |
+
N= 96 k= 32: best_cos=subspace (0.4057, gap=0.0184) best_nn=subspace (1.000)
|
| 109 |
+
N= 96 k= 48: best_cos=subspace (0.4080, gap=0.0158) best_nn=subspace (1.000)
|
| 110 |
+
N=128 k= 16: best_cos=subspace (0.3770, gap=0.0298) best_nn=subspace (1.000)
|
| 111 |
+
N=128 k= 24: best_cos=subspace (0.3774, gap=0.0298) best_nn=subspace (1.000)
|
| 112 |
+
N=128 k= 32: best_cos=subspace (0.3778, gap=0.0286) best_nn=subspace (1.000)
|
| 113 |
+
N=128 k= 48: best_cos=subspace (0.3812, gap=0.0261) best_nn=subspace (1.000)
|
| 114 |
+
N=128 k= 64: best_cos=subspace (0.3880, gap=0.0222) best_nn=subspace (1.000)
|
| 115 |
+
|
| 116 |
+
====================================================================================================
|
| 117 |
+
PROJECTION QUALITY ANALYSIS β B=256, M=1024
|
| 118 |
+
Question: can rank-k SVD approximate rank-N SVD?
|
| 119 |
+
====================================================================================================
|
| 120 |
+
|
| 121 |
+
N=32:
|
| 122 |
+
k Energy% Recon_proj Recon_trunc S_rel_err Subspace Proj ms Full ms Speedup
|
| 123 |
+
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 124 |
+
8 30.99% 8.65e-01 8.31e-01 0.5622 0.4432 7.849ms 0.508ms 0.1x
|
| 125 |
+
12 44.74% 7.89e-01 7.43e-01 0.4606 0.5508 10.556ms 0.508ms 0.0x
|
| 126 |
+
16 57.56% 7.05e-01 6.51e-01 0.3379 0.6432 11.222ms 0.508ms 0.0x
|
| 127 |
+
24 80.59% 4.41e-01 4.41e-01 0.0000 1.0000 0.510ms 0.508ms 1.0x
|
| 128 |
+
|
| 129 |
+
N=48:
|
| 130 |
+
k Energy% Recon_proj Recon_trunc S_rel_err Subspace Proj ms Full ms Speedup
|
| 131 |
+
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 132 |
+
8 22.33% 9.11e-01 8.81e-01 0.7880 0.3642 7.901ms 172.136ms 21.8x
|
| 133 |
+
12 32.39% 8.65e-01 8.22e-01 0.6575 0.4454 10.668ms 172.136ms 16.1x
|
| 134 |
+
16 41.87% 8.15e-01 7.62e-01 0.4125 0.5193 11.490ms 172.136ms 15.0x
|
| 135 |
+
24 59.24% 7.05e-01 6.38e-01 0.3178 0.6433 11.497ms 172.136ms 15.0x
|
| 136 |
+
32 74.71% 5.76e-01 5.03e-01 0.3076 0.7575 180.615ms 172.136ms 1.0x
|
| 137 |
+
|
| 138 |
+
N=64:
|
| 139 |
+
k Energy% Recon_proj Recon_trunc S_rel_err Subspace Proj ms Full ms Speedup
|
| 140 |
+
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 141 |
+
8 17.83% 9.34e-01 9.06e-01 0.9635 0.3152 7.917ms 182.058ms 23.0x
|
| 142 |
+
12 25.91% 9.00e-01 8.61e-01 0.6937 0.3898 10.693ms 182.058ms 17.0x
|
| 143 |
+
16 33.58% 8.64e-01 8.15e-01 0.6025 0.4484 11.311ms 182.058ms 16.1x
|
| 144 |
+
24 47.78% 7.89e-01 7.23e-01 0.3495 0.5505 11.207ms 182.058ms 16.2x
|
| 145 |
+
32 60.64% 7.05e-01 6.27e-01 0.3116 0.6438 176.453ms 182.058ms 1.0x
|
| 146 |
+
48 82.74% 4.99e-01 4.15e-01 0.3090 0.8138 204.625ms 182.058ms 0.9x
|
| 147 |
+
|
| 148 |
+
N=96:
|
| 149 |
+
k Energy% Recon_proj Recon_trunc S_rel_err Subspace Proj ms Full ms Speedup
|
| 150 |
+
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 151 |
+
8 13.09% 9.56e-01 9.32e-01 1.2033 0.2583 8.035ms 295.451ms 36.8x
|
| 152 |
+
16 24.83% 9.11e-01 8.67e-01 0.8721 0.3637 11.426ms 295.451ms 25.9x
|
| 153 |
+
24 35.57% 8.64e-01 8.02e-01 0.5587 0.4475 11.238ms 295.451ms 26.3x
|
| 154 |
+
32 45.45% 8.15e-01 7.38e-01 0.4710 0.5163 175.186ms 295.451ms 1.7x
|
| 155 |
+
48 62.97% 7.05e-01 6.08e-01 0.3243 0.6407 200.525ms 295.451ms 1.5x
|
| 156 |
+
64 77.83% 5.75e-01 4.71e-01 0.3073 0.7578 306.531ms 295.451ms 1.0x
|
| 157 |
+
|
| 158 |
+
N=128:
|
| 159 |
+
k Energy% Recon_proj Recon_trunc S_rel_err Subspace Proj ms Full ms Speedup
|
| 160 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββοΏ½οΏ½οΏ½ββββββββββββββββββββββββββββββββ
|
| 161 |
+
8 10.60% 9.68e-01 9.46e-01 1.4678 0.2251 8.085ms 436.551ms 54.0x
|
| 162 |
+
16 20.19% 9.34e-01 8.93e-01 1.0025 0.3145 11.509ms 436.551ms 37.9x
|
| 163 |
+
24 29.04% 9.00e-01 8.42e-01 0.7155 0.3867 11.432ms 436.551ms 38.2x
|
| 164 |
+
32 37.26% 8.64e-01 7.92e-01 0.5374 0.4447 174.994ms 436.551ms 2.5x
|
| 165 |
+
48 52.05% 7.89e-01 6.93e-01 0.3598 0.5498 198.286ms 436.551ms 2.2x
|
| 166 |
+
64 64.91% 7.05e-01 5.92e-01 0.3121 0.6407 305.364ms 436.551ms 1.4x
|
| 167 |
+
96 85.61% 4.99e-01 3.79e-01 0.3011 0.8136 452.623ms 436.551ms 1.0x
|
| 168 |
+
|
| 169 |
+
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 170 |
+
SUMMARY: Recommended target_rank per N
|
| 171 |
+
(β₯99% energy, β₯0.99 subspace cos, best speedup)
|
| 172 |
+
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 173 |
+
N= 32: best k= 24 β 80.6% energy, subspace=1.0000 (below 99% threshold)
|
| 174 |
+
N= 48: best k= 32 β 74.7% energy, subspace=0.7575 (below 99% threshold)
|
| 175 |
+
N= 64: best k= 48 β 82.7% energy, subspace=0.8138 (below 99% threshold)
|
| 176 |
+
N= 96: best k= 64 β 77.8% energy, subspace=0.7578 (below 99% threshold)
|
| 177 |
+
N=128: best k= 96 β 85.6% energy, subspace=0.8136 (below 99% threshold)
|
| 178 |
+
|
| 179 |
+
==============================================================================================================
|
| 180 |
+
N-DIMENSION SWEEP β NVIDIA RTX PRO 6000 Blackwell Server Edition
|
| 181 |
+
B=512, M=1024
|
| 182 |
+
==============================================================================================================
|
| 183 |
+
N Triton Gram Newton Projβ24 Projβ16 Torch Best Speedup
|
| 184 |
+
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 185 |
+
2 0.020ms 0.227ms β β β 79.040ms triton 3859.1x
|
| 186 |
+
3 0.022ms 0.242ms β β β 118.394ms triton 5394.2x
|
| 187 |
+
4 β 0.255ms β β β 125.263ms gram 490.6x
|
| 188 |
+
5 β 0.258ms β β β 144.426ms gram 560.8x
|
| 189 |
+
6 β 0.269ms β β β 155.042ms gram 576.9x
|
| 190 |
+
7 β 0.280ms β β β 163.771ms gram 584.2x
|
| 191 |
+
8 β 0.291ms 0.290ms β β 168.934ms newton 582.1x
|
| 192 |
+
10 β 0.380ms 0.379ms β β 190.292ms newton 502.2x
|
| 193 |
+
12 β 0.400ms 0.400ms β β 213.394ms gram 534.1x
|
| 194 |
+
16 β 0.429ms 0.428ms β β 230.670ms newton 538.6x
|
| 195 |
+
20 β 0.597ms 0.596ms β β 253.657ms newton 425.6x
|
| 196 |
+
24 β 0.651ms 0.651ms β 0.652ms 272.293ms newton 418.5x
|
| 197 |
+
32 β 0.795ms 0.794ms 0.800ms 22.025ms 303.023ms newton 381.8x
|
| 198 |
+
48 β 344.049ms 344.202ms 22.439ms 22.481ms 550.746ms proj24 24.5x
|
| 199 |
+
64 β 365.206ms 365.148ms 21.749ms 22.173ms 609.352ms proj24 28.0x
|
| 200 |
+
96 β 590.636ms 590.664ms 21.862ms 22.353ms 973.819ms proj24 44.5x
|
| 201 |
+
128 β 868.144ms 868.262ms 22.085ms 22.469ms 1421.924ms proj24 64.4x
|
| 202 |
+
|
| 203 |
+
================================================================================
|
| 204 |
+
SUMMARY
|
| 205 |
+
================================================================================
|
| 206 |
+
|
| 207 |
+
Strategy by N:
|
| 208 |
+
N=2: Fused Triton (closed-form Jacobi rotation)
|
| 209 |
+
N=3: Fused Triton (cyclic Jacobi in registers)
|
| 210 |
+
N=4-32: Gram + eigh (bmm + cuSOLVER eigh) β sub-ms
|
| 211 |
+
N=48+: Projected SVD (Nβk, cheap SVD, lift back) β check quality table
|
| 212 |
+
|
| 213 |
+
Standalone utilities:
|
| 214 |
+
newton_schulz_invsqrt(G) β batched G^{-1/2} via pure bmm
|
| 215 |
+
projected_svd(A, target_rank=k) β rank-k approximate SVD
|
| 216 |
+
projected_svd_quality(A, target_rank) β measure approximation quality
|
| 217 |
+
|
| 218 |
+
Key question answered: energy_ratio and subspace_cos in quality table
|
| 219 |
+
|
| 220 |
+
Results saved to svd_general_profile.json
|
| 221 |
+
================================================================================
|