Parallax-VIDEO-Boxes

Open In Colab Download Weights

A high-performance temporal latent system designed for RGBA video synthesis. This repository hosts a dual-model architecture comprising the GlobalHCA Autoencoder for spatial compression and a Latent Predictor for state-space transitions.

πŸš€ Overview

This project focuses on compressing 4-channel (RGBA) synthetic environments into a 1012-dimensional latent space. It is designed to maintain high structural fidelity and transparency data for 45x45 resolution frames.

Features

  • Dual-Model Pipeline: Separates spatial understanding (AE) from temporal prediction (Predictor).
  • High-Dimensional Latents: Uses a 1012-unit bottleneck for rich feature representation.
  • Robustness-Tested: Maintains stable performance across varying signal-to-noise ratios.
  • RGBA Native: Built specifically to handle Alpha channel transparency.

πŸ› οΈ Quick Start (Google Colab / Python)

The following code automatically fetches the necessary weights from the Parallax-Labs Hugging Face repositories and initializes the architectures.

import torch
import torch.nn as nn
import os
import requests

# 1. DOWNLOAD MODELS
def download_file(url, filename):
    if not os.path.exists(filename):
        print(f"Downloading {filename}...")
        r = requests.get(url, allow_redirects=True)
        with open(filename, 'wb') as f:
            f.write(r.content)

urls = {
    "ae_global.pt": "https://huggingface.co/Parallax-labs-1/parallax_VIDEO-Boxes/resolve/main/ae_global.pt",
    "predictor.pt": "https://huggingface.co/Parallax-labs-1/parallax_VIDEO-Boxes/resolve/main/predictor.pt",
    "vision_base.pt": "https://huggingface.co/Parallax-labs-1/parallax_VISION-boxes-RGBA/resolve/main/model.pt"
}

for name, url in urls.items():
    download_file(url, name)

# 2. DEFINE ARCHITECTURES
class GlobalHCA_AE(nn.Module):
    def __init__(self):
        super().__init__()
        self.encoder = nn.Sequential(
            nn.Conv2d(4, 16, 3, stride=2, padding=1), nn.ReLU(),
            nn.Conv2d(16, 32, 3, stride=2, padding=1), nn.ReLU(),
            nn.Flatten(),
            nn.Linear(32 * 12 * 12, 2048), nn.ReLU(),
            nn.Linear(2048, 1012 * 2)
        )
        self.decoder = nn.Sequential(
            nn.Linear(1012, 2048), nn.ReLU(),
            nn.Linear(2048, 32 * 12 * 12), nn.ReLU(),
            nn.Unflatten(1, (32, 12, 12)),
            nn.ConvTranspose2d(32, 16, 3, stride=2, padding=1), nn.ReLU(),
            nn.ConvTranspose2d(16, 4, 3, stride=2, padding=1), nn.Sigmoid()
        )

    def forward(self, x):
        h = self.encoder(x)
        mu, _ = h.chunk(2, dim=-1)
        return self.decoder(mu), mu

class LatentPredictor(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(1012, 2048), nn.ReLU(),
            nn.Linear(2048, 4096), nn.ReLU(),
            nn.Linear(4096, 8100),
            nn.Unflatten(1, (4, 45, 45))
        )
    def forward(self, z):
        return self.net(z)

class AlphaAutoencoder(nn.Module):
    def __init__(self):
        super().__init__()
        self.encoder = nn.Sequential(
            nn.Conv2d(4, 32, 3, stride=2, padding=1),
            nn.LeakyReLU(0.2),
            nn.Conv2d(32, 64, 3, stride=2, padding=1),
            nn.LeakyReLU(0.2),
            nn.Conv2d(64, 128, 3, stride=2, padding=1),
            nn.LeakyReLU(0.2),
            nn.Conv2d(128, 256, 3, stride=2, padding=1),
            nn.LeakyReLU(0.2),
            nn.Conv2d(256, 4, 1) 
        )
        self.decoder = nn.Sequential(
            nn.Conv2d(4, 256, 3, padding=1),
            nn.PixelShuffle(2),
            nn.LeakyReLU(0.2),
            nn.Conv2d(64, 128, 3, padding=1),
            nn.PixelShuffle(2),
            nn.LeakyReLU(0.2),
            nn.Conv2d(32, 64, 3, padding=1),
            nn.PixelShuffle(2),
            nn.LeakyReLU(0.2),
            nn.Conv2d(16, 16, 3, padding=1),
            nn.PixelShuffle(2),
            nn.Sigmoid()
        )
    def forward(self, x):
        z = self.encoder(x)
        return self.decoder(z), z

# 3. LOAD WEIGHTS
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Initialize models
ae_global = GlobalHCA_AE().to(device)
predictor = LatentPredictor().to(device)
vision_base = AlphaAutoencoder().to(device)

# Load weights
ae_global.load_state_dict(torch.load("ae_global.pt", map_location=device))
predictor.load_state_dict(torch.load("predictor.pt", map_location=device))
vision_base.load_state_dict(torch.load("vision_base.pt", map_location=device))

print("All Parallax-Labs models (Global AE, Predictor, and Vision Base) loaded successfully.")

πŸ“Š Performance Metrics

Phase Metric Value
Autoencoder Reconstruction MSE 0.001722
Predictor Latent-to-Pixel MSE 0.000150
Robustness Max Noise Stability 0.30

πŸ“‚ Repository Links

  • Main Video Repository
  • Vision Base Repository (RGBA Boxes) Developed by Parallax-Labs
Downloads last month
34
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train Parallax-labs-1/parallax_VIDEO-Boxes