Parallax_Vision-AmongUs 2.0

Parallax_Vision-AmongUs 2.0 is an experimental autoencoder that pushes the boundaries of neural image reconstruction. While standard methods prioritize individual pixel values, this model focuses on the internal geometry and relational structure of a scene. It is designed to prove that structural integrity can be maintained even through an extreme information bottleneck.

1. Technical Specs

  • Total Parameters: ~132,827,000 (132.8M)
  • Input Resolution: 720x720 RGB
  • Bottleneck Latent Dim: 512
  • Architecture: 4-Stage Convolutional Relational Autoencoder

2. The Facts: The 0.03% Challenge

This model operates at a staggering 3037.50:1 compression ratio.

In practical terms, the model discards 99.97% of the raw input data, retaining only a tiny "semantic DNA" of 512 values. Unlike traditional compression that results in blurry "pixel soup" at this level, Parallax_Vision 2.0 maintains a sharp logical layout. [cite_start]It ensures that UI elements, chat bubbles, and icons remain in their correct relative positions—achieving "almost perfect" structural reconstruction from nearly nothing. [cite: 14, 17]

Metric Value
Input Values 1,555,200
Latent Values 512
Compression Ratio 3037.50 : 1
Data Retained ~0.03%

3. The Story: Geometry Over Pixels

Most models fail at high ratios because they lose the "where" and "how." [cite_start]Parallax_Vision 2.0 succeeds by prioritizing the consistency of distances between elements. [cite: 2]

Think of it as a master cartographer. Instead of trying to memorize every blade of grass in a field, the model memorizes the exact distance between the landmarks. [cite_start]By shifting the objective from "Where is this pixel?" to "How does this pixel relate to its neighbors?", the model can reconstruct a complex 720p UI from a microscopic footprint. [cite: 4, 17]

It’s high-fidelity logic hidden in a 512-dimension seed.

4. Usage (Inference Script)

This script demonstrates the reconstruction capabilities of the 132M parameter architecture.

import os
import torch
import torch.nn as nn
from PIL import Image
from torchvision import transforms
import matplotlib.pyplot as plt

# --- CONFIGURATION ---
MODEL_REPO = "Parallax-labs-1/parallax-VISION_amongus-2.0"
INPUT_IMAGE = "test_screenshot.png" 
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
LATENT_DIM = 512
IMG_SIZE = 720

# --- 1. MODEL ARCHITECTURE ---
class ParallaxAutoencoder(nn.Module):
    def __init__(self):
        super(ParallaxAutoencoder, self).__init__()
        # Encoder: Structural Downsampling
        self.encoder = nn.Sequential(
            nn.Conv2d(3, 16, 3, padding=1), nn.ReLU(), nn.MaxPool2d(2),
            nn.Conv2d(16, 32, 3, padding=1), nn.ReLU(), nn.MaxPool2d(2),
            nn.Conv2d(32, 64, 3, padding=1), nn.ReLU(), nn.MaxPool2d(2),
            nn.Conv2d(64, 64, 3, padding=1), nn.ReLU(), nn.MaxPool2d(2),
            nn.Flatten(),
            nn.Linear(64 * 45 * 45, LATENT_DIM)
        )
        # Decoder: Geometric Expansion
        self.decoder = nn.Sequential(
            nn.Linear(LATENT_DIM, 64 * 45 * 45), nn.ReLU(),
            nn.Unflatten(1, (64, 45, 45)),
            nn.ConvTranspose2d(64, 64, 2, stride=2), nn.ReLU(),
            nn.ConvTranspose2d(64, 32, 2, stride=2), nn.ReLU(),
            nn.ConvTranspose2d(32, 16, 2, stride=2), nn.ReLU(),
            nn.ConvTranspose2d(16, 3, 2, stride=2), nn.Sigmoid()
        )

    def forward(self, x):
        return self.decoder(self.encoder(x))

# --- 2. LOAD & INITIALIZE ---
model = ParallaxAutoencoder().to(DEVICE)

if not os.path.exists("model.pth"):
    print("Downloading weights...")
    os.system(f"wget https://huggingface.co/{MODEL_REPO}/resolve/main/model.pth")

model.load_state_dict(torch.load('model.pth', map_location=DEVICE))
model.eval()

# --- 3. EXECUTE RECONSTRUCTION ---
def run_reconstruction(img_path):
    if not os.path.exists(img_path):
        print(f"Error: {img_path} not found.")
        return

    img = Image.open(img_path).convert('RGB')
    preprocess = transforms.Compose([
        transforms.Resize((IMG_SIZE, IMG_SIZE)),
        transforms.ToTensor()
    ])
    img_t = preprocess(img).unsqueeze(0).to(DEVICE)
    
    with torch.no_grad():
        output = model(img_t)
    
    # Calculate Ratio
    ratio = (IMG_SIZE * IMG_SIZE * 3) / LATENT_DIM

    # Plotting
    fig, ax = plt.subplots(1, 2, figsize=(15, 7))
    ax[0].imshow(img_t[0].cpu().permute(1, 2, 0))
    ax[0].set_title("Original Image")
    ax[0].axis('off')
    
    ax[1].imshow(output[0].cpu().permute(1, 2, 0))
    ax[1].set_title(f"Reconstructed (Ratio {ratio:.2f}:1)")
    ax[1].axis('off')
    
    plt.show()

if __name__ == "__main__":
    run_reconstruction(INPUT_IMAGE)

5. License

Licensed under Apache 2.0.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including Parallax-labs-1/parallax-VISION_amongus-2.0