!pip uninstall geolip-svae -y
!pip install "git+https://github.com/AbstractEyes/geolip-svae.git"
D=2 internal matmul experiment 1. Target D=16, internally represented by d=2.
It's about a quarter of the amount of time of a full d=16 representation.
Success will allow full cayley rope spearman, not partial.
geolip.linalg backend:
CUDA: yes
Triton: 3.6.0
FL eigh: enabled
Triton SVD: enabled
GPU: NVIDIA RTX PRO 6000 Blackwell Server Edition
Computing target CV for V=16, D=16 on S^15...
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
SpectralViT β Pure SpectralCell Transformer
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
SpectralViT:
Patch embed: 12,544
Cayley PE: 8,192 (128 rotation planes Γ 64 positions)
Cells (6Γ): 4,920,768 (820,128 per cell)
D=16, V=16, hidden=256
CM cells: [2, 5] (2 primary)
SVD split: 2 full + 4 sliced (D=2 Γ 8)
LayerNorms: 3,584
Classifier: 91,492
Cross-attn: 13,632 (clipped at 0.5)
Total: 5,036,580
Architecture: PatchEmbed β CayleyPE β 6Γ SpectralCell (CM every 3) β pool β classify
Soft hand: target_cv=0.1984 Ο=0.15 boost=1.0
CV penalty: 0.01 (differentiable through cm_vol2)
EMA momentum: 0.99
Grad clip: 0.5 cross-attn only, uncapped otherwise
CutMix: Ξ±=1.0 prob=0.5
Optimizer: Adam lr=0.001
CIFAR-100, 200 epochs
Initial profiling (3 warmup + 1 measured)...
ββ PROFILE FORWARD COMPONENTS βββββββββββββββββββββββββββββββ
β cell_5_CM_fmt 21.4ms 40.3% βββββββββββββ
β cell_2_CM 21.3ms 40.3% βββββββββββββ
β cell_0_cd 2.5ms 4.8% β
β cell_3_cd 2.5ms 4.7% β
β cell_1_cd 2.5ms 4.7% β
β cell_4_cd 2.5ms 4.7% β
β cayley_pe 0.2ms 0.3%
β patch_embed 0.1ms 0.2%
β classify 0.1ms 0.1%
β TOTAL 53.0ms
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββ PROFILE FULL TRAIN STEP ββββββββββββββββββββββββββββββββββ
β forward 54.0ms 38.1% ββββββββββββ
β backward 52.5ms 37.0% ββββββββββββ
β optim_step 28.8ms 20.3% ββββββ
β grad_clip 3.5ms 2.5%
β loss 3.1ms 2.2%
β TOTAL 141.8ms
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββ PROFILE CELL INTERNALS (cell_0) ββββββββββββββββββββββββββ
β recompose 0.7ms 22.6% βββββββ
β enc_mlp 0.5ms 17.4% βββββ
β dec_mlp 0.5ms 15.5% βββββ
β patchwork 0.4ms 14.1% ββββ
β cross_attn 0.4ms 13.2% ββββ
β svd_sliced_8x_D2 0.2ms 7.2% ββ
β pairwise_d2 0.2ms 6.3% ββ
β normalize 0.1ms 3.7% β
β cm_validation 0.0ms 0.0%
β TOTAL 3.0ms
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ep 1 acc=5.1% β
train=3.3% ema_cv=0.1645 boost=1.987 lr=0.001000
S=[1.83, 1.65, 1.48, 1.36...0.07] PE angles: mean=0.0237 max=0.1333
Top 5: c20=75% c73=68% c53=57% c82=52% c52=50%
Bot 5: c65=0% c16=0% c17=0% c59=0% c58=0%
Mean: 5.1% Std: 14.0%
ββ PROFILE FORWARD ep1 ββββββββββββββββββββββββββββββββββββββ
β cell_2_CM 20.7ms 40.0% βββββββββββββ
β cell_5_CM_fmt 20.6ms 39.7% βββββββββββββ
β cell_0_cd 2.8ms 5.4% β
β cell_1_cd 2.4ms 4.7% β
β cell_3_cd 2.4ms 4.6% β
β cell_4_cd 2.4ms 4.6% β
β patch_embed 0.2ms 0.5%
β cayley_pe 0.2ms 0.4%
β classify 0.1ms 0.2%
β TOTAL 51.8ms
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββ PROFILE STEP ep1 βββββββββββββββββββββββββββββββββββββββββ
β forward 51.4ms 48.8% ββββββββββββββββ
β backward 49.6ms 47.1% βββββββββββββββ
β optim_step 3.6ms 3.4% β
β grad_clip 0.5ms 0.5%
β loss 0.2ms 0.2%
β TOTAL 105.3ms
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββ PROFILE CELL INTERNALS ep1 βββββββββββββββββββββββββββββββ
β svd_full_D16 18.8ms 89.7% βββββββββββββββββββββββββββββ
β dec_mlp 0.4ms 2.1%
β enc_mlp 0.4ms 2.1%
β patchwork 0.4ms 1.7%
β cross_attn 0.3ms 1.6%
β cm_validation 0.3ms 1.3%
β pairwise_d2 0.2ms 0.8%
β recompose 0.1ms 0.5%
β normalize 0.1ms 0.4%
β TOTAL 21.0ms
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ep 2 acc=7.9% β
train=5.9% ema_cv=0.2097 boost=1.990 lr=0.001000
S=[1.85, 1.64, 1.48, 1.36...0.04] PE angles: mean=0.0359 max=0.1985
Top 5: c53=59% c24=53% c60=50% c82=49% c18=46%
Bot 5: c74=0% c72=0% c16=0% c68=0% c19=0%
Mean: 7.9% Std: 14.1%
ep 3 acc=9.8% β
train=7.2% ema_cv=0.2033 boost=1.996 lr=0.000999
S=[1.89, 1.63, 1.46, 1.32...0.06] PE angles: mean=0.0480 max=0.3018
Top 5: c60=62% c82=59% c53=53% c43=50% c52=47%
Bot 5: c75=0% c74=0% c19=0% c72=0% c22=0%
Mean: 9.8% Std: 15.6%
ep 4 acc=10.4% β
train=8.4% ema_cv=0.1922 boost=1.999 lr=0.000999
S=[1.90, 1.61, 1.47, 1.31...0.04] PE angles: mean=0.0562 max=0.3470
ep 5 acc=11.7% β
train=9.4% ema_cv=0.2211 boost=1.994 lr=0.000998
S=[1.90, 1.61, 1.44, 1.30...0.05] PE angles: mean=0.0628 max=0.3809
Ep 7: 13%|ββ | 26/195 [00:02<00:17, 9.73it/s]
Bare with this one today, the idea is to discover the most utilizable form of geometric alignment positional encoding, specifically targeting the spectral cell. This will allow direct internalized embedding structures to represent the complex k4 simplex system without the need of conv or transformer positional controllers.
This structure will be touch and go and the first forms will be fragile.
First prototype is a similar cayley-rotational position alignment as the original constellation.
Second will be the cantor staircase, beatrix staircase, stereoscopic, waveform interpolation, kymatio scatterpoint2d, and multiple adjacent structures. This will involve many many tests. I assume over 80.
Successful positional encoding will replace the bulk CONV when complete with a compact explicit representation.
These tokens that can be directly interpreted by traditional rotary transformers and transformer structures, as well as introduce the first functional geolip-svd-transformer system integration for packaged reuse.
By this time tomorrow I expect a series of prototypes to be functional for the geolip-transformer, replacing the invalid observer structure of before with surge-training geolip-svd-transformer line. This will give a proper profiled battery for what is best, what is fastest, what is most effectively quick, and the rounded median that the default geolip-transformer will encompass.
If the surge training paradigm is successful, this will introduce the same rapidfire learning that the actual batteries experience.
Surge.
If unsuccessful, I continue from the faults.
When successful, the full geolip-conduit-battery structure will be integrated as well, allowing FILM, LORA, and any other form of training possible for any model you wish; snap-in observer capable.