Freckles β Sub-Patch Spectral VAE (D=4, 4Γ4 patches)
2.5M parameters. 10MB checkpoint. Resolution-independent spectral atoms.
Architecture
Image β 4Γ4 patches β MLP encode β sphere normalize β SVD (fp64, FLEigh) β
spectral cross-attention (2 heads) β decode β stitch β boundary smooth
| Parameter | Value |
|---|---|
| V (matrix rows) | 48 |
| D (spectral modes) | 4 |
| Patch size | 4Γ4 |
| Hidden dim | 384 |
| Depth | 4 residual blocks |
| Cross-attention | 2 layers, 2 heads |
| Parameters | 2,557,539 |
| Checkpoint size | 10.3 MB |
| Compression | 12:1 (48 β 4) |
Quick Start
pip install git+https://github.com/AbstractEyes/geolip-svae.git
from geolip_svae import load_model
model, cfg = load_model(hf_version='v40_freckles_noise')
out = model(images) # any resolution divisible by 4
# out['recon'] β reconstructed image
# out['svd']['S'] β omega tokens (B, N_patches, 4)
# out['svd']['U'] β left singular vectors
# out['svd']['Vt'] β right singular vectors
Trained Models
| Version | Resolution | Patches | Dataset | MSE | Epochs | Noises Per Epoch | Total Trained | Status |
|---|---|---|---|---|---|---|---|---|
| v40 | 64Γ64 | 256 | 16 noise types | 0.000005 | 100 | 1.28m | 128m | β Complete |
| v43 | 128x128 | 1024 | 16 noise types | - | XX | 1.28m | 0 | Upcoming when 512 finishes |
| v41 | 256Γ256 | 4096 | 16 noise types | β | 1 | 1.28m | 400k | π Training ETA 2h |
| v42 | 512Γ512 | 16384 | 16 noise types | β | 1 | 1.28m | 120k | π Training ETA 26h |
| v43 | 1024x1024 | 65536 | 16 noise types | - | XX | 1.28m | 0 | Upcoming 4/10/2025 |
Both v41 and v42 have very little to learn, the patches and attention need to adjust. That's about it.
If they require more, new measures will be established and back in the oven they go. Likely runpod next time with wide-compiled variations.
I chose 1.28m because it's the same amount as imagenet. There is no imagenet images being used.
Geometric Constants (D=4)
erank: 3.82 / 4.0 = 95.5% (locked from epoch 40)
S0: 4.53 (leading singular value)
SD: 1.95 (trailing singular value)
S0/SD: 2.32 (ratio, stable across resolutions)
S_delta: 0.055 (cross-attention coordination magnitude)
Results
Resolution Invariance (v40, trained at 64Γ64 only)
Freckles produces identical MSE at every resolution β zero-shot, no retraining.
36Γ36 81 patches MSE=0.000002 0.27s
64Γ64 256 patches MSE=0.000002 0.01s
128Γ128 1024 patches MSE=0.000002 0.01s
256Γ256 4096 patches MSE=0.000002 0.01s
512Γ512 16384 patches MSE=tile-only 0.42s 31MB
1024Γ1024 65536 patches MSE=tile-only 1.68s 31MB
2048Γ2048 262144 patches MSE=tile-only 6.75s 31MB
4096Γ4096 1M patches MSE=tile-only 27.13s 31MB
Tiled encoding uses 31MB VRAM regardless of resolution.
Tile-Encode Consistency
Encoding 128Γ128 as 4 tiles of 64Γ64 vs native 128Γ128 encoding:
All 16 noise types: 1.00Γ match
Omega distance: 0.000000
4Γ4 patches are truly atomic. They produce identical omega tokens whether encoded individually or as part of any larger image.
Out-of-Distribution Noise (16 novel types, never trained)
log_normal 0.000001 β perlin_approx 0.000005 β
beta 0.000006 β wavelet_noise 0.000002 β
weibull 0.000001 β fractal_fbm 0.000003 β
gumbel 0.000001 β gabor 0.000000 β
rayleigh 0.000001 β sine_composite 0.000002 β
voronoi_approx 0.000000 β shot_noise 0.000003 β
quantize_noise 0.000002 β ring_noise 0.000001 β
jpeg_artifact 0.000000 β spiral_noise 0.000001 β
All 16/16 handled. Worst case: 1.4Γ known average. erank stays at 3.80β3.83 for every alien distribution. Freckles learned spectral structure, not noise identity.
Multi-Noise Composites
gauss+pink 0.000003 (2 layers)
cauchy+laplace 0.000003 (2 layers)
3-way: gauss+pink+block 0.000001 (3 layers)
4-way: gauss+unif+pink+expo 0.000001 (4 layers)
heavy: cauchy+salt+sparse 0.000002 (3 layers)
all_16_equal 0.000000 (16 layers)
16 superimposed noise distributions reconstructed losslessly.
Noise Triangulation (zero-shot classification)
Classification of noise type per patch using only cosine similarity against reference fingerprints (no classifier trained):
4 zones at 128px: 15β22%
9 zones at 128px: 24%
16 zones at 256px: 22%
Near random chance. The omega tokens encode universal spectral structure β they reconstruct everything but don't differentiate types. Classification is a downstream task requiring a learned head, not an intrinsic property of the representation.
Training Log
v40 β Freckles 64Γ64 Noise (complete)
16 noise types, all unlocked from epoch 1. No curriculum needed.
Epoch 1: val=0.0143 salt=0.077 cauchy=0.046
Epoch 4: val=0.0030 salt=0.020 cauchy=0.011
Epoch 11: val=0.0005 salt=0.003 cauchy=0.001
Epoch 22: val=0.0002 all types β€ 0.001
Epoch 34: val=0.0001 all types = 0.000
Epoch 47: val=0.0000 all types = 0.000
Epoch 100: val=0.000005 all types = 0.000 (FINAL)
Crash at epoch 27 (MSE spiked 0.0002 β 0.0019) β recovered in 2 epochs. Same attractor resilience as Johanna.
Roadmap
- v40: Freckles 64Γ64 noise β complete
- v41: Freckles 256Γ256 noise β training now, finetuning 64x64 with additional patches
- v42: Freckles 512Γ512 noise β training now, finetuning 64x64 with additional patches
- Freckles image training (ImageNet)
- Freckles Grandmaster variant (single-shot denoiser)
- Spectral codebook integration (noise-native text tokenizer)
- Observer architecture for omega token analysis
- Cross-regime Procrustes alignment (D=4 β D=16)
Dependencies
- geolip-core β FLEigh fast eigendecomposition
- geolip-svae β PatchSVAE architecture
Links
- Repository: github.com/AbstractEyes/geolip-svae
- All models: huggingface.co/AbstractPhil/geolip-SVAE
- Article: Omega Tokens: Finding The Self Solving Frame
DevLog
Upcoming freckles
We're going to see if we can get that 27 second timer down, and see how big we can really make the resolution within a reasonable amount of time shall we?
Big resolution means more data to work with. Tighter bottleneck. More difficult problem for the scaffold to solve.
It may work, they may require full structural shifts - we'll see.
Unexpected delivery from the void.
Now this is an odd one. It seems, Freckles at 2m params, defeated all the bigger 17m models at every stage.
So the pure noise training routine, seems to have yielded something... Somehow, nearly perfect.
Freckles learned all 16 types of noise, and the patches can be utilized in various ways losslessly. This is not a rigid task, this is a lensed informational compaction system.
This is as close to a lens as it can be.
Soup soup soup soup
It's still a souper for now, but that's fine. It'll get cleaned up. As all great musicians know, one must start slow and then build speed over time. The speed in our case, will be attenuating the tuner to tune into the radio correctly.
Chaos to order plan
The primary issue now isn't disorderly compaction and recall, it's now how to best disperse the patchwork in a unilaterally useful way, to guarantee structural recall - rather than losing positioning information for text.
I have a plan for that which involves a translation codebook. https://github.com/AbstractEyes/geolip-svae/blob/main/geolip_svae/spectral_codebook.py
So with this functional model it needs to expand to have a centric compaction and extraction observer.
It should look like a centerlined embedding transmitter. Something that by observation will impact results, however, it will not be ABLE to impact results - if we monitor the frozen state. The observer paradigm has something unique that real world physics cannot do, and that's observe the energetic collapse without impacting the results.
The results are frozen. It's vacuum sealed. We know it works, because the numbers say it works.
Time to Observe our tiny models and see what we see.
We need to observe these models, how they behave, how they compact this information, how they recall it, and how that recalled information is contributed to a useful spectrum of manifest.
Hypothetical solution, likely to change
With that information, will yield what I call the omega processor. Which will give us exactly the useful informational anchored triangulation required to fully embed and re-utilize the excessively difficult to rationalize for transformers - bottleneck information.
It's too compact, full of too much information, too much compacted. This is fine normally if your task is to compact, but our task is to transfer learningt.
If we cannot transfer the learning, we must adapt, especially when something is nearly perfect like this.
The noise analysis is promising
======================================================================
TEST 1: Extreme Resolution Scaling
======================================================================
36Γ36 81 patches | MSE= 0.000002 | 0.27s | 20MB
52Γ52 169 patches | MSE= 0.000002 | 0.01s | 21MB
64Γ64 256 patches | MSE= 0.000002 | 0.01s | 22MB
76Γ76 361 patches | MSE= 0.000002 | 0.01s | 24MB
100Γ100 625 patches | MSE= 0.000002 | 0.01s | 29MB
128Γ128 1024 patches | MSE= 0.000002 | 0.01s | 42MB
140Γ140 1225 patches | MSE= 0.000002 | 0.01s | 51MB
172Γ172 1849 patches | MSE= 0.000002 | 0.01s | 83MB
204Γ204 2601 patches | MSE= 0.000002 | 0.01s | 142MB
256Γ256 4096 patches | MSE= 0.000002 | 0.01s | 308MB
300Γ300 5625 patches | MSE= tile-only | 0.01s | 556MB
444Γ444 12321 patches | MSE= tile-only | 0.05s | 2503MB
512Γ512 16384 patches | MSE= tile-only | 0.42s | 31MB
600Γ600 22500 patches | MSE= tile-only | 0.05s | 8210MB
1024Γ1024 65536 patches | MSE= tile-only | 1.68s | 31MB
2048Γ2048 262144 patches | MSE= tile-only | 6.75s | 31MB
4096Γ4096 1048576 patches | MSE= tile-only | 27.13s | 31MB
Freckles can handle mse 0.000002 zero shot with almost... no time increase.
Everything is 0.000002.
======================================================================
TEST 2: Out-of-Distribution Noise Types
======================================================================
Known noise avg MSE: 0.000004
log_normal MSE=0.000001 er=3.82 ratio=0.3x β handles
beta MSE=0.000006 er=3.83 ratio=1.4x β handles
weibull MSE=0.000001 er=3.81 ratio=0.2x β handles
gumbel MSE=0.000001 er=3.80 ratio=0.3x β handles
rayleigh MSE=0.000001 er=3.80 ratio=0.3x β handles
perlin_approx MSE=0.000005 er=3.81 ratio=1.3x β handles
wavelet_noise MSE=0.000002 er=3.82 ratio=0.5x β handles
fractal_fbm MSE=0.000003 er=3.82 ratio=0.6x β handles
gabor MSE=0.000000 er=3.81 ratio=0.1x β handles
sine_composite MSE=0.000002 er=3.82 ratio=0.5x β handles
voronoi_approx MSE=0.000000 er=3.82 ratio=0.1x β handles
shot_noise MSE=0.000003 er=3.82 ratio=0.6x β handles
quantize_noise MSE=0.000002 er=3.82 ratio=0.5x β handles
jpeg_artifact MSE=0.000000 er=3.81 ratio=0.1x β handles
ring_noise MSE=0.000001 er=3.81 ratio=0.3x β handles
spiral_noise MSE=0.000001 er=3.81 ratio=0.2x β handles
Freckles can handle out of training noise with no issue. Full recreation.
We will need a triangulation
Looks like our near perfect patchwork composite isn't so perfect yet.
======================================================================
TEST 3: Noise Triangulation (zero-shot spatial classification)
======================================================================
Reference fingerprints computed for 16 types
4zones_128px acc=15.3% Β± 6.9% (32Γ32 grid, 4 zones)
4zones_256px acc=22.3% Β± 11.6% (64Γ64 grid, 4 zones)
4zones_512px acc=11.2% Β± 2.8% (128Γ128 grid, 4 zones)
9zones_128px acc=24.5% Β± 7.2% (32Γ32 grid, 9 zones)
9zones_256px acc=21.1% Β± 4.5% (64Γ64 grid, 9 zones)
16zones_256px acc=22.3% Β± 2.1% (64Γ64 grid, 16 zones)
16zones_512px acc=8.3% Β± 0.5% (128Γ128 grid, 16 zones)
To properly spot the noise deviance within huge structures, we'll need a quasi radar-scanner. This should allow most noise to be predominantly recreated after being fully compacted later and directly classified as to where, what origin, most likely cause, and most likely relation to nearby coordinates.
If utilized this should speed astronomer capacity by a huge margin, and for the immediate use-case will allow massive amounts of noise to be introduced into 2d and 3d scenes for direct composite identification and recall.
This has great potential.
Multi Noise tests
======================================================================
TEST 4: Multi-Noise Composites
======================================================================
gauss+pink MSE=0.000003 (2 layers)
gauss+salt MSE=0.000002 (2 layers)
pink+brown MSE=0.000001 (2 layers)
cauchy+laplace MSE=0.000003 (2 layers)
checker+gradient MSE=0.000001 (2 layers)
3-way: gauss+pink+block MSE=0.000001 (3 layers)
4-way: gauss+unif+pink+expo MSE=0.000001 (4 layers)
heavy: cauchy+salt+sparse MSE=0.000002 (3 layers)
gentle: pink+brown+gradient MSE=0.000001 (3 layers)
all_16_equal MSE=0.000000 (16 layers)
Well that's promising. The model can fully recreate all those noise types on multiple layers.
This means noise classification is an arbitrary task, and heads can be trained to directly classify noise for label outputs through the observer system.
This is... well it'll work. Suffice it to say, even with what I've built here, it will produce high-yield potential for all scientific fields if utilized.