| --- |
| license: apache-2.0 |
| library_name: pytorch |
| tags: |
| - autoencoder |
| - vision |
| - image-reconstruction |
| - heavy-ae |
| metrics: |
| - accuracy |
| - mse |
| --- |
| |
| # HeavyAE (Heavy AutoEncoder) |
|
|
| HeavyAE is a high-resolution symmetric convolutional autoencoder optimized for reconstructing mobile interface screenshots at a native aspect ratio. |
|
|
| ## Model Details |
| - **Model Type:** Convolutional Autoencoder (Non-Variational) |
| - **Parameters:** ~12.6 Million |
| - **Input Resolution:** 1600 x 720 (RGB) |
| - **Latent Bottleneck:** 8 Channels |
| - **Activation:** LeakyReLU (Encoder/Decoder) | Sigmoid (Output) |
|
|
| ## Metadata |
| | Feature | Value | |
| |---|---| |
| | Layers | 10 (5 Encoder, 5 Decoder) | |
| | Target Size | 1600x720 | |
| | Latent Space | 8x45x100 | |
| | Format | PyTorch (`model.pt`) | |
|
|
| ## Benchmarks |
| The following results were obtained from internal testing samples (e.g., UI reconstruction tasks): |
|
|
| * **Average Reconstruction Accuracy:** 85.41% |
| * **Average MSE:** 0.0058 |
|
|
| [Insert Benchmark] **I will edit it after i have one, too lazy to do so.** |
| ### Model Performance Benchmark: AE (10M Parameters) |
| **Architecture:** 1600×720 Autoencoder | **Bottleneck:** 96:1 | **Optimization:** Secret |
| | Test Case | Input Resolution | Aspect Ratio | Original Accuracy | Low-Noise Acc | High-Noise Acc | Primary Challenge | |
| |---|---|---|---|---|---|---| |
| | **IRL Faces (Forest)** | ~4k×3k (HQ) | 4:3 | **94.45%** | 94.44% | 94.07% | Complex gradients & textures | |
| | **AI Generated Art** | 128×128 (LQ) | 1:1 | **96.68%** | 96.61% | 95.85% | Upscaling/Interpolation noise | |
| | **Digital Doodle** | 720×720 (MD) | 1:1 | **95.91%** | 95.90% | 95.71% | Sharp high-contrast edges | |
|
|
| ## Inference & Stress Test (Google Colab) |
|
|
| ```python |
| import torch |
| import torch.nn as nn |
| import numpy as np |
| import requests |
| from PIL import Image |
| from torchvision import transforms |
| import matplotlib.pyplot as plt |
| |
| class HeavyAE(nn.Module): |
| def __init__(self): |
| super(HeavyAE, self).__init__() |
| self.encoder = nn.Sequential( |
| nn.Conv2d(3, 128, 3, stride=2, padding=1), nn.LeakyReLU(0.2), |
| nn.Conv2d(128, 256, 3, stride=2, padding=1), nn.LeakyReLU(0.2), |
| nn.Conv2d(256, 512, 3, stride=2, padding=1), nn.LeakyReLU(0.2), |
| nn.Conv2d(512, 1024, 3, stride=2, padding=1), nn.LeakyReLU(0.2), |
| nn.Conv2d(1024, 8, 3, stride=1, padding=1) |
| ) |
| self.decoder = nn.Sequential( |
| nn.ConvTranspose2d(8, 1024, 3, stride=2, padding=1, output_padding=1), nn.LeakyReLU(0.2), |
| nn.ConvTranspose2d(1024, 512, 3, stride=2, padding=1, output_padding=1), nn.LeakyReLU(0.2), |
| nn.ConvTranspose2d(512, 256, 3, stride=2, padding=1, output_padding=1), nn.LeakyReLU(0.2), |
| nn.ConvTranspose2d(256, 128, 3, stride=2, padding=1, output_padding=1), nn.LeakyReLU(0.2), |
| nn.ConvTranspose2d(128, 3, 3, stride=1, padding=1), nn.Sigmoid() |
| ) |
| def forward(self, x): return self.decoder(self.encoder(x)) |
| |
| # Setup |
| device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
| model = HeavyAE().to(device) |
| model_url = "https://huggingface.co/Parallax-labs-1/parallax_VISION-ValidPhone/resolve/main/model.pt" |
| |
| # Download weights |
| response = requests.get(model_url) |
| with open("model.pt", "wb") as f: |
| f.write(response.content) |
| |
| model.load_state_dict(torch.load("model.pt", map_location=device)) |
| model.eval() |
| |
| def test_model(img_path): |
| orig = Image.open(img_path).convert('RGB') |
| w, h = orig.size |
| |
| preprocess = transforms.Compose([transforms.Resize((720, 1600)), transforms.ToTensor()]) |
| input_t = preprocess(orig).unsqueeze(0).to(device) |
| |
| with torch.no_grad(): |
| recon = model(input_t) |
| # Stress Tests |
| noise_l = model(input_t + torch.randn_like(input_t) * 0.05) |
| noise_h = model(input_t + torch.randn_like(input_t) * 0.2) |
| |
| # Metrics |
| def acc(a, b): return (1 - torch.mean(torch.abs(a - b)).item()) * 100 |
| print(f"--- Log ---\nOriginal Accuracy: {acc(input_t, recon):.2f}%") |
| print(f"Low-Noise Accuracy: {acc(input_t, noise_l):.2f}%") |
| print(f"High-Noise Accuracy: {acc(input_t, noise_h):.2f}%") |
| |
| # Output Images |
| res = transforms.ToPILImage()(recon.squeeze().cpu()).resize((w, h)) |
| diff = np.abs(np.array(orig).astype(float) - np.array(res).astype(float)).astype(np.uint8) |
| |
| fig, ax = plt.subplots(1, 3, figsize=(18, 6)) |
| ax[0].imshow(orig); ax[0].set_title("Input") |
| ax[1].imshow(res); ax[1].set_title("Reconstruction") |
| ax[2].imshow(diff); ax[2].set_title("Error Map") |
| for a in ax: a.axis('off') |
| plt.show() |
| ``` |