Parallax_Vision-AmongUs 2.0
Parallax_Vision-AmongUs 2.0 is an experimental autoencoder that pushes the boundaries of neural image reconstruction. While standard methods prioritize individual pixel values, this model focuses on the internal geometry and relational structure of a scene. It is designed to prove that structural integrity can be maintained even through an extreme information bottleneck.
1. Technical Specs
- Total Parameters: ~132,827,000 (132.8M)
- Input Resolution: 720x720 RGB
- Bottleneck Latent Dim: 512
- Architecture: 4-Stage Convolutional Relational Autoencoder
2. The Facts: The 0.03% Challenge
This model operates at a staggering 3037.50:1 compression ratio.
In practical terms, the model discards 99.97% of the raw input data, retaining only a tiny "semantic DNA" of 512 values. Unlike traditional compression that results in blurry "pixel soup" at this level, Parallax_Vision 2.0 maintains a sharp logical layout. [cite_start]It ensures that UI elements, chat bubbles, and icons remain in their correct relative positions—achieving "almost perfect" structural reconstruction from nearly nothing. [cite: 14, 17]
| Metric | Value |
|---|---|
| Input Values | 1,555,200 |
| Latent Values | 512 |
| Compression Ratio | 3037.50 : 1 |
| Data Retained | ~0.03% |
3. The Story: Geometry Over Pixels
Most models fail at high ratios because they lose the "where" and "how." [cite_start]Parallax_Vision 2.0 succeeds by prioritizing the consistency of distances between elements. [cite: 2]
Think of it as a master cartographer. Instead of trying to memorize every blade of grass in a field, the model memorizes the exact distance between the landmarks. [cite_start]By shifting the objective from "Where is this pixel?" to "How does this pixel relate to its neighbors?", the model can reconstruct a complex 720p UI from a microscopic footprint. [cite: 4, 17]
It’s high-fidelity logic hidden in a 512-dimension seed.
4. Usage (Inference Script)
This script demonstrates the reconstruction capabilities of the 132M parameter architecture.
import os
import torch
import torch.nn as nn
from PIL import Image
from torchvision import transforms
import matplotlib.pyplot as plt
# --- CONFIGURATION ---
MODEL_REPO = "Parallax-labs-1/parallax-VISION_amongus-2.0"
INPUT_IMAGE = "test_screenshot.png"
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
LATENT_DIM = 512
IMG_SIZE = 720
# --- 1. MODEL ARCHITECTURE ---
class ParallaxAutoencoder(nn.Module):
def __init__(self):
super(ParallaxAutoencoder, self).__init__()
# Encoder: Structural Downsampling
self.encoder = nn.Sequential(
nn.Conv2d(3, 16, 3, padding=1), nn.ReLU(), nn.MaxPool2d(2),
nn.Conv2d(16, 32, 3, padding=1), nn.ReLU(), nn.MaxPool2d(2),
nn.Conv2d(32, 64, 3, padding=1), nn.ReLU(), nn.MaxPool2d(2),
nn.Conv2d(64, 64, 3, padding=1), nn.ReLU(), nn.MaxPool2d(2),
nn.Flatten(),
nn.Linear(64 * 45 * 45, LATENT_DIM)
)
# Decoder: Geometric Expansion
self.decoder = nn.Sequential(
nn.Linear(LATENT_DIM, 64 * 45 * 45), nn.ReLU(),
nn.Unflatten(1, (64, 45, 45)),
nn.ConvTranspose2d(64, 64, 2, stride=2), nn.ReLU(),
nn.ConvTranspose2d(64, 32, 2, stride=2), nn.ReLU(),
nn.ConvTranspose2d(32, 16, 2, stride=2), nn.ReLU(),
nn.ConvTranspose2d(16, 3, 2, stride=2), nn.Sigmoid()
)
def forward(self, x):
return self.decoder(self.encoder(x))
# --- 2. LOAD & INITIALIZE ---
model = ParallaxAutoencoder().to(DEVICE)
if not os.path.exists("model.pth"):
print("Downloading weights...")
os.system(f"wget https://huggingface.co/{MODEL_REPO}/resolve/main/model.pth")
model.load_state_dict(torch.load('model.pth', map_location=DEVICE))
model.eval()
# --- 3. EXECUTE RECONSTRUCTION ---
def run_reconstruction(img_path):
if not os.path.exists(img_path):
print(f"Error: {img_path} not found.")
return
img = Image.open(img_path).convert('RGB')
preprocess = transforms.Compose([
transforms.Resize((IMG_SIZE, IMG_SIZE)),
transforms.ToTensor()
])
img_t = preprocess(img).unsqueeze(0).to(DEVICE)
with torch.no_grad():
output = model(img_t)
# Calculate Ratio
ratio = (IMG_SIZE * IMG_SIZE * 3) / LATENT_DIM
# Plotting
fig, ax = plt.subplots(1, 2, figsize=(15, 7))
ax[0].imshow(img_t[0].cpu().permute(1, 2, 0))
ax[0].set_title("Original Image")
ax[0].axis('off')
ax[1].imshow(output[0].cpu().permute(1, 2, 0))
ax[1].set_title(f"Reconstructed (Ratio {ratio:.2f}:1)")
ax[1].axis('off')
plt.show()
if __name__ == "__main__":
run_reconstruction(INPUT_IMAGE)
5. License
Licensed under Apache 2.0.