Update README.md

c257066 verified 12 days ago

4.54 kB

	---
	license: apache-2.0
	library_name: pytorch
	tags:
	- autoencoder
	- vision
	- image-reconstruction
	- heavy-ae
	metrics:
	- accuracy
	- mse
	---

	# HeavyAE (Heavy AutoEncoder)

	HeavyAE is a high-resolution symmetric convolutional autoencoder optimized for reconstructing mobile interface screenshots at a native aspect ratio.

	## Model Details
	- Model Type: Convolutional Autoencoder (Non-Variational)
	- Parameters: ~12.6 Million
	- Input Resolution: 1600 x 720 (RGB)
	- Latent Bottleneck: 8 Channels
	- Activation: LeakyReLU (Encoder/Decoder) \| Sigmoid (Output)

	## Metadata
	\| Feature \| Value \|
	\|---\|---\|
	\| Layers \| 10 (5 Encoder, 5 Decoder) \|
	\| Target Size \| 1600x720 \|
	\| Latent Space \| 8x45x100 \|
	\| Format \| PyTorch (`model.pt`) \|

	## Benchmarks
	The following results were obtained from internal testing samples (e.g., UI reconstruction tasks):

	* Average Reconstruction Accuracy: 85.41%
	* Average MSE: 0.0058

	[Insert Benchmark] I will edit it after i have one, too lazy to do so.
	### Model Performance Benchmark: AE (10M Parameters)
	Architecture: 1600×720 Autoencoder \| Bottleneck: 96:1 \| Optimization: Secret
	\| Test Case \| Input Resolution \| Aspect Ratio \| Original Accuracy \| Low-Noise Acc \| High-Noise Acc \| Primary Challenge \|
	\|---\|---\|---\|---\|---\|---\|---\|
	\| IRL Faces (Forest) \| ~4k×3k (HQ) \| 4:3 \| 94.45% \| 94.44% \| 94.07% \| Complex gradients & textures \|
	\| AI Generated Art \| 128×128 (LQ) \| 1:1 \| 96.68% \| 96.61% \| 95.85% \| Upscaling/Interpolation noise \|
	\| Digital Doodle \| 720×720 (MD) \| 1:1 \| 95.91% \| 95.90% \| 95.71% \| Sharp high-contrast edges \|

	## Inference & Stress Test (Google Colab)

	```python
	import torch
	import torch.nn as nn
	import numpy as np
	import requests
	from PIL import Image
	from torchvision import transforms
	import matplotlib.pyplot as plt

	class HeavyAE(nn.Module):
	def __init__(self):
	super(HeavyAE, self).__init__()
	self.encoder = nn.Sequential(
	nn.Conv2d(3, 128, 3, stride=2, padding=1), nn.LeakyReLU(0.2),
	nn.Conv2d(128, 256, 3, stride=2, padding=1), nn.LeakyReLU(0.2),
	nn.Conv2d(256, 512, 3, stride=2, padding=1), nn.LeakyReLU(0.2),
	nn.Conv2d(512, 1024, 3, stride=2, padding=1), nn.LeakyReLU(0.2),
	nn.Conv2d(1024, 8, 3, stride=1, padding=1)
	)
	self.decoder = nn.Sequential(
	nn.ConvTranspose2d(8, 1024, 3, stride=2, padding=1, output_padding=1), nn.LeakyReLU(0.2),
	nn.ConvTranspose2d(1024, 512, 3, stride=2, padding=1, output_padding=1), nn.LeakyReLU(0.2),
	nn.ConvTranspose2d(512, 256, 3, stride=2, padding=1, output_padding=1), nn.LeakyReLU(0.2),
	nn.ConvTranspose2d(256, 128, 3, stride=2, padding=1, output_padding=1), nn.LeakyReLU(0.2),
	nn.ConvTranspose2d(128, 3, 3, stride=1, padding=1), nn.Sigmoid()
	)
	def forward(self, x): return self.decoder(self.encoder(x))

	# Setup
	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
	model = HeavyAE().to(device)
	model_url = "https://huggingface.co/Parallax-labs-1/parallax_VISION-ValidPhone/resolve/main/model.pt"

	# Download weights
	response = requests.get(model_url)
	with open("model.pt", "wb") as f:
	f.write(response.content)

	model.load_state_dict(torch.load("model.pt", map_location=device))
	model.eval()

	def test_model(img_path):
	orig = Image.open(img_path).convert('RGB')
	w, h = orig.size

	preprocess = transforms.Compose([transforms.Resize((720, 1600)), transforms.ToTensor()])
	input_t = preprocess(orig).unsqueeze(0).to(device)

	with torch.no_grad():
	recon = model(input_t)
	# Stress Tests
	noise_l = model(input_t + torch.randn_like(input_t) * 0.05)
	noise_h = model(input_t + torch.randn_like(input_t) * 0.2)

	# Metrics
	def acc(a, b): return (1 - torch.mean(torch.abs(a - b)).item()) * 100
	print(f"--- Log ---\nOriginal Accuracy: {acc(input_t, recon):.2f}%")
	print(f"Low-Noise Accuracy: {acc(input_t, noise_l):.2f}%")
	print(f"High-Noise Accuracy: {acc(input_t, noise_h):.2f}%")

	# Output Images
	res = transforms.ToPILImage()(recon.squeeze().cpu()).resize((w, h))
	diff = np.abs(np.array(orig).astype(float) - np.array(res).astype(float)).astype(np.uint8)

	fig, ax = plt.subplots(1, 3, figsize=(18, 6))
	ax[0].imshow(orig); ax[0].set_title("Input")
	ax[1].imshow(res); ax[1].set_title("Reconstruction")
	ax[2].imshow(diff); ax[2].set_title("Error Map")
	for a in ax: a.axis('off')
	plt.show()
	```