Freckles β€” Sub-Patch Spectral VAE (D=4, 4Γ—4 patches)

2.5M parameters. 10MB checkpoint. Resolution-independent spectral atoms.

Architecture

Image β†’ 4Γ—4 patches β†’ MLP encode β†’ sphere normalize β†’ SVD (fp64, FLEigh) β†’
spectral cross-attention (2 heads) β†’ decode β†’ stitch β†’ boundary smooth
Parameter Value
V (matrix rows) 48
D (spectral modes) 4
Patch size 4Γ—4
Hidden dim 384
Depth 4 residual blocks
Cross-attention 2 layers, 2 heads
Parameters 2,557,539
Checkpoint size 10.3 MB
Compression 12:1 (48 β†’ 4)

Quick Start

pip install git+https://github.com/AbstractEyes/geolip-svae.git
from geolip_svae import load_model

model, cfg = load_model(hf_version='v40_freckles_noise')
out = model(images)  # any resolution divisible by 4
# out['recon']           β€” reconstructed image
# out['svd']['S']        β€” omega tokens (B, N_patches, 4)
# out['svd']['U']        β€” left singular vectors
# out['svd']['Vt']       β€” right singular vectors

Trained Models

Version Resolution Patches Dataset MSE Epochs Noises Per Epoch Total Trained Status
v40 64Γ—64 256 16 noise types 0.000005 100 1.28m 128m βœ… Complete
v43 128x128 1024 16 noise types - XX 1.28m 0 Upcoming when 512 finishes
v41 256Γ—256 4096 16 noise types β€” 1 1.28m 400k πŸ”„ Training ETA 2h
v42 512Γ—512 16384 16 noise types β€” 1 1.28m 120k πŸ”„ Training ETA 26h
v43 1024x1024 65536 16 noise types - XX 1.28m 0 Upcoming 4/10/2025

Both v41 and v42 have very little to learn, the patches and attention need to adjust. That's about it.

If they require more, new measures will be established and back in the oven they go. Likely runpod next time with wide-compiled variations.

I chose 1.28m because it's the same amount as imagenet. There is no imagenet images being used.

Geometric Constants (D=4)

erank:     3.82 / 4.0 = 95.5%     (locked from epoch 40)
S0:        4.53                     (leading singular value)
SD:        1.95                     (trailing singular value)
S0/SD:     2.32                     (ratio, stable across resolutions)
S_delta:   0.055                    (cross-attention coordination magnitude)

Results

Resolution Invariance (v40, trained at 64Γ—64 only)

Freckles produces identical MSE at every resolution β€” zero-shot, no retraining.

     36Γ—36       81 patches    MSE=0.000002    0.27s
     64Γ—64      256 patches    MSE=0.000002    0.01s
    128Γ—128    1024 patches    MSE=0.000002    0.01s
    256Γ—256    4096 patches    MSE=0.000002    0.01s
    512Γ—512   16384 patches    MSE=tile-only   0.42s    31MB
   1024Γ—1024  65536 patches    MSE=tile-only   1.68s    31MB
   2048Γ—2048 262144 patches    MSE=tile-only   6.75s    31MB
   4096Γ—4096    1M patches     MSE=tile-only  27.13s    31MB

Tiled encoding uses 31MB VRAM regardless of resolution.

Tile-Encode Consistency

Encoding 128Γ—128 as 4 tiles of 64Γ—64 vs native 128Γ—128 encoding:

All 16 noise types: 1.00Γ— match
Omega distance:     0.000000

4Γ—4 patches are truly atomic. They produce identical omega tokens whether encoded individually or as part of any larger image.

Out-of-Distribution Noise (16 novel types, never trained)

log_normal       0.000001   βœ“     perlin_approx    0.000005   βœ“
beta             0.000006   βœ“     wavelet_noise    0.000002   βœ“
weibull          0.000001   βœ“     fractal_fbm      0.000003   βœ“
gumbel           0.000001   βœ“     gabor            0.000000   βœ“
rayleigh         0.000001   βœ“     sine_composite   0.000002   βœ“
voronoi_approx   0.000000   βœ“     shot_noise       0.000003   βœ“
quantize_noise   0.000002   βœ“     ring_noise       0.000001   βœ“
jpeg_artifact    0.000000   βœ“     spiral_noise     0.000001   βœ“

All 16/16 handled. Worst case: 1.4Γ— known average. erank stays at 3.80–3.83 for every alien distribution. Freckles learned spectral structure, not noise identity.

Multi-Noise Composites

gauss+pink                    0.000003    (2 layers)
cauchy+laplace                0.000003    (2 layers)
3-way: gauss+pink+block       0.000001    (3 layers)
4-way: gauss+unif+pink+expo   0.000001    (4 layers)
heavy: cauchy+salt+sparse     0.000002    (3 layers)
all_16_equal                  0.000000    (16 layers)

16 superimposed noise distributions reconstructed losslessly.

Noise Triangulation (zero-shot classification)

Classification of noise type per patch using only cosine similarity against reference fingerprints (no classifier trained):

4 zones at 128px:    15–22%
9 zones at 128px:    24%
16 zones at 256px:   22%

Near random chance. The omega tokens encode universal spectral structure β€” they reconstruct everything but don't differentiate types. Classification is a downstream task requiring a learned head, not an intrinsic property of the representation.

Training Log

v40 β€” Freckles 64Γ—64 Noise (complete)

16 noise types, all unlocked from epoch 1. No curriculum needed.

Epoch  1:  val=0.0143   salt=0.077  cauchy=0.046
Epoch  4:  val=0.0030   salt=0.020  cauchy=0.011
Epoch 11:  val=0.0005   salt=0.003  cauchy=0.001
Epoch 22:  val=0.0002   all types ≀ 0.001
Epoch 34:  val=0.0001   all types = 0.000
Epoch 47:  val=0.0000   all types = 0.000
Epoch 100: val=0.000005 all types = 0.000 (FINAL)

Crash at epoch 27 (MSE spiked 0.0002 β†’ 0.0019) β€” recovered in 2 epochs. Same attractor resilience as Johanna.

Roadmap

  • v40: Freckles 64Γ—64 noise β€” complete
  • v41: Freckles 256Γ—256 noise β€” training now, finetuning 64x64 with additional patches
  • v42: Freckles 512Γ—512 noise β€” training now, finetuning 64x64 with additional patches
  • Freckles image training (ImageNet)
  • Freckles Grandmaster variant (single-shot denoiser)
  • Spectral codebook integration (noise-native text tokenizer)
  • Observer architecture for omega token analysis
  • Cross-regime Procrustes alignment (D=4 ↔ D=16)

Dependencies

Links

DevLog

Upcoming freckles

We're going to see if we can get that 27 second timer down, and see how big we can really make the resolution within a reasonable amount of time shall we?

Big resolution means more data to work with. Tighter bottleneck. More difficult problem for the scaffold to solve.

It may work, they may require full structural shifts - we'll see.

Unexpected delivery from the void.

Now this is an odd one. It seems, Freckles at 2m params, defeated all the bigger 17m models at every stage.

So the pure noise training routine, seems to have yielded something... Somehow, nearly perfect.

Freckles learned all 16 types of noise, and the patches can be utilized in various ways losslessly. This is not a rigid task, this is a lensed informational compaction system.

This is as close to a lens as it can be.

Soup soup soup soup

It's still a souper for now, but that's fine. It'll get cleaned up. As all great musicians know, one must start slow and then build speed over time. The speed in our case, will be attenuating the tuner to tune into the radio correctly.

Chaos to order plan

The primary issue now isn't disorderly compaction and recall, it's now how to best disperse the patchwork in a unilaterally useful way, to guarantee structural recall - rather than losing positioning information for text.

I have a plan for that which involves a translation codebook. https://github.com/AbstractEyes/geolip-svae/blob/main/geolip_svae/spectral_codebook.py

So with this functional model it needs to expand to have a centric compaction and extraction observer.

It should look like a centerlined embedding transmitter. Something that by observation will impact results, however, it will not be ABLE to impact results - if we monitor the frozen state. The observer paradigm has something unique that real world physics cannot do, and that's observe the energetic collapse without impacting the results.

The results are frozen. It's vacuum sealed. We know it works, because the numbers say it works.

Time to Observe our tiny models and see what we see.

We need to observe these models, how they behave, how they compact this information, how they recall it, and how that recalled information is contributed to a useful spectrum of manifest.

Hypothetical solution, likely to change

With that information, will yield what I call the omega processor. Which will give us exactly the useful informational anchored triangulation required to fully embed and re-utilize the excessively difficult to rationalize for transformers - bottleneck information.

It's too compact, full of too much information, too much compacted. This is fine normally if your task is to compact, but our task is to transfer learningt.

If we cannot transfer the learning, we must adapt, especially when something is nearly perfect like this.

The noise analysis is promising

======================================================================
TEST 1: Extreme Resolution Scaling
======================================================================
     36Γ—36          81 patches | MSE=    0.000002 | 0.27s | 20MB
     52Γ—52         169 patches | MSE=    0.000002 | 0.01s | 21MB
     64Γ—64         256 patches | MSE=    0.000002 | 0.01s | 22MB
     76Γ—76         361 patches | MSE=    0.000002 | 0.01s | 24MB
    100Γ—100        625 patches | MSE=    0.000002 | 0.01s | 29MB
    128Γ—128       1024 patches | MSE=    0.000002 | 0.01s | 42MB
    140Γ—140       1225 patches | MSE=    0.000002 | 0.01s | 51MB
    172Γ—172       1849 patches | MSE=    0.000002 | 0.01s | 83MB
    204Γ—204       2601 patches | MSE=    0.000002 | 0.01s | 142MB
    256Γ—256       4096 patches | MSE=    0.000002 | 0.01s | 308MB
    300Γ—300       5625 patches | MSE=   tile-only | 0.01s | 556MB
    444Γ—444      12321 patches | MSE=   tile-only | 0.05s | 2503MB
    512Γ—512      16384 patches | MSE=   tile-only | 0.42s | 31MB
    600Γ—600      22500 patches | MSE=   tile-only | 0.05s | 8210MB
   1024Γ—1024     65536 patches | MSE=   tile-only | 1.68s | 31MB
   2048Γ—2048    262144 patches | MSE=   tile-only | 6.75s | 31MB
   4096Γ—4096   1048576 patches | MSE=   tile-only | 27.13s | 31MB

Freckles can handle mse 0.000002 zero shot with almost... no time increase.

Everything is 0.000002.

======================================================================
TEST 2: Out-of-Distribution Noise Types
======================================================================

  Known noise avg MSE: 0.000004
  log_normal         MSE=0.000001 er=3.82 ratio=0.3x βœ“ handles
  beta               MSE=0.000006 er=3.83 ratio=1.4x βœ“ handles
  weibull            MSE=0.000001 er=3.81 ratio=0.2x βœ“ handles
  gumbel             MSE=0.000001 er=3.80 ratio=0.3x βœ“ handles
  rayleigh           MSE=0.000001 er=3.80 ratio=0.3x βœ“ handles
  perlin_approx      MSE=0.000005 er=3.81 ratio=1.3x βœ“ handles
  wavelet_noise      MSE=0.000002 er=3.82 ratio=0.5x βœ“ handles
  fractal_fbm        MSE=0.000003 er=3.82 ratio=0.6x βœ“ handles
  gabor              MSE=0.000000 er=3.81 ratio=0.1x βœ“ handles
  sine_composite     MSE=0.000002 er=3.82 ratio=0.5x βœ“ handles
  voronoi_approx     MSE=0.000000 er=3.82 ratio=0.1x βœ“ handles
  shot_noise         MSE=0.000003 er=3.82 ratio=0.6x βœ“ handles
  quantize_noise     MSE=0.000002 er=3.82 ratio=0.5x βœ“ handles
  jpeg_artifact      MSE=0.000000 er=3.81 ratio=0.1x βœ“ handles
  ring_noise         MSE=0.000001 er=3.81 ratio=0.3x βœ“ handles
  spiral_noise       MSE=0.000001 er=3.81 ratio=0.2x βœ“ handles

Freckles can handle out of training noise with no issue. Full recreation.

We will need a triangulation

Looks like our near perfect patchwork composite isn't so perfect yet.

======================================================================
TEST 3: Noise Triangulation (zero-shot spatial classification)
======================================================================
  Reference fingerprints computed for 16 types
  4zones_128px         acc=15.3% Β± 6.9% (32Γ—32 grid, 4 zones)
  4zones_256px         acc=22.3% Β± 11.6% (64Γ—64 grid, 4 zones)
  4zones_512px         acc=11.2% Β± 2.8% (128Γ—128 grid, 4 zones)
  9zones_128px         acc=24.5% Β± 7.2% (32Γ—32 grid, 9 zones)
  9zones_256px         acc=21.1% Β± 4.5% (64Γ—64 grid, 9 zones)
  16zones_256px        acc=22.3% Β± 2.1% (64Γ—64 grid, 16 zones)
  16zones_512px        acc=8.3% Β± 0.5% (128Γ—128 grid, 16 zones)

To properly spot the noise deviance within huge structures, we'll need a quasi radar-scanner. This should allow most noise to be predominantly recreated after being fully compacted later and directly classified as to where, what origin, most likely cause, and most likely relation to nearby coordinates.

If utilized this should speed astronomer capacity by a huge margin, and for the immediate use-case will allow massive amounts of noise to be introduced into 2d and 3d scenes for direct composite identification and recall.

This has great potential.

Multi Noise tests

======================================================================
TEST 4: Multi-Noise Composites
======================================================================
  gauss+pink                          MSE=0.000003 (2 layers)
  gauss+salt                          MSE=0.000002 (2 layers)
  pink+brown                          MSE=0.000001 (2 layers)
  cauchy+laplace                      MSE=0.000003 (2 layers)
  checker+gradient                    MSE=0.000001 (2 layers)
  3-way: gauss+pink+block             MSE=0.000001 (3 layers)
  4-way: gauss+unif+pink+expo         MSE=0.000001 (4 layers)
  heavy: cauchy+salt+sparse           MSE=0.000002 (3 layers)
  gentle: pink+brown+gradient         MSE=0.000001 (3 layers)
  all_16_equal                        MSE=0.000000 (16 layers)

Well that's promising. The model can fully recreate all those noise types on multiple layers.

This means noise classification is an arbitrary task, and heads can be trained to directly classify noise for label outputs through the observer system.

This is... well it'll work. Suffice it to say, even with what I've built here, it will produce high-yield potential for all scientific fields if utilized.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support