Parallax_AUDIO-geometrydash

Parallax_AUDIO-geometrydash is an experimental neural audio autoencoder that redefines how high-fidelity signal reconstruction is handled. Instead of relying on traditional amplitude-matching, this model prioritizes the internal geometry and relational topology of a waveform. It is designed to prove that the "soul" and transient sharpness of audio can be preserved even through a massive information bottleneck.

1. Technical Specs

Total Parameters: ~450,000 (0.45M)
Input Format: 1.0s Mono Raw Audio @ 44,100Hz
Bottleneck Ratio: 96:1
Architecture: 5-Stage 1D-Convolutional Relational Autoencoder
Training Data: 1.5+ hours of raw Geometry Dash gameplay audio (clicks, transitions, and rhythmic synchronization).

2. The Facts: The Geometry of Sound

This model operates at a 96:1 compression ratio, squeezing 44,100 samples into just 460 latent values.

In audio terms, this level of compression usually results in "underwater" artifacts or muffled transients. [cite_start]However, by focusing on the consistency of distances between samples, Parallax_AUDIO ensures that the "clack" of a player's click and the sharp transitions of Geometry Dash levels remain crisp. [cite: 1, 14] [cite_start]It avoids the "blurring" effect common in high-compression bottlenecks by enforcing sharp relational gradients during the reconstruction of the wave. [cite: 14]

Metric	Value
Input Samples (1s)	44,100
Latent Dimension	~460
Compression Ratio	96 : 1
Reconstruction Clarity	Structural Integrity Optimized

3. The Story: Beyond "Pixel-Matching" for Waves

[cite_start]Most audio models fail at high ratios because they treat samples as isolated points. [cite: 3] [cite_start]If a reconstructed wave is shifted by just one millisecond, traditional loss functions fail, even if the sound is identical to the human ear. [cite: 2]

[cite_start]Parallax_AUDIO-geometrydash succeeds by shifting the objective from "What is the value of this sample?" to "How does this sample relate to its neighbors?" [cite: 17] [cite_start]By utilizing a multi-scale approach that connects immediate neighbors to distant rhythmic points, the model captures both the local topology (high-frequency clarity) and the global structure (bass and rhythm). [cite: 9, 13]

It is a high-fidelity reconstruction engine that treats audio like a geometric landscape.

4. Usage (Inference Script)

This script demonstrates the reconstruction capabilities of the 1D-CNN architecture on raw 44.1kHz audio.

import torch
import torch.nn as nn
import librosa
import numpy as np
import soundfile as sf
import os
import requests
from google.colab import files

# --- CONFIGURATION ---
# Replace this URL with the direct download link for your model.pt
MODEL_URL = "https://huggingface.co/Parallax-labs-1/parallax_AUDIO-geometrydash/resolve/main/model.pt"
SAMPLE_RATE = 44100
WINDOW_SIZE = 4096
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# --- 1. MODEL ARCHITECTURE ---
class RawAudioAE(nn.Module):
    def __init__(self):
        super().__init__()
        self.encoder = nn.Sequential(
            nn.Conv1d(1, 32, kernel_size=15, stride=4, padding=7),
            nn.LeakyReLU(0.2),
            nn.Conv1d(32, 64, kernel_size=7, stride=4, padding=3),
            nn.LeakyReLU(0.2),
            nn.Conv1d(64, 128, kernel_size=7, stride=3, padding=3),
            nn.LeakyReLU(0.2),
            nn.Conv1d(128, 256, kernel_size=3, stride=2, padding=1),
            nn.LeakyReLU(0.2)
        )
        self.decoder = nn.Sequential(
            nn.ConvTranspose1d(256, 128, kernel_size=3, stride=2, padding=1, output_padding=0),
            nn.LeakyReLU(0.2),
            nn.ConvTranspose1d(128, 64, kernel_size=7, stride=3, padding=3, output_padding=0),
            nn.LeakyReLU(0.2),
            nn.ConvTranspose1d(64, 32, kernel_size=7, stride=4, padding=3, output_padding=3),
            nn.LeakyReLU(0.2),
            nn.ConvTranspose1d(32, 1, kernel_size=15, stride=4, padding=7, output_padding=3),
            nn.Tanh()
        )

    def forward(self, x):
        return self.decoder(self.encoder(x))

# --- 2. DOWNLOAD & LOAD MODEL ---
def setup_model():
    if not os.path.exists('model.pt'):
        print("Downloading model.pt from Hugging Face...")
        response = requests.get(MODEL_URL, stream=True)
        with open('model.pt', 'wb') as f:
            for chunk in response.iter_content(chunk_size=8192):
                f.write(chunk)

    model = RawAudioAE().to(DEVICE)
    model.load_state_dict(torch.load('model.pt', map_location=DEVICE))
    model.eval()
    print("Model loaded and ready.")
    return model

# --- 3. PROCESSING ---
def process_audio(model, input_path):
    print(f"Loading {input_path}...")
    y, _ = librosa.load(input_path, sr=SAMPLE_RATE, mono=True)

    # Calculate chunks
    num_chunks = len(y) // WINDOW_SIZE
    output_audio = []

    print(f"Reconstructing {num_chunks} segments...")
    with torch.no_grad():
        for i in range(num_chunks):
            start = i * WINDOW_SIZE
            end = start + WINDOW_SIZE

            chunk = torch.from_numpy(y[start:end]).float().view(1, 1, -1).to(DEVICE)
            reconstructed = model(chunk)
            output_audio.append(reconstructed.squeeze().cpu().numpy())

    # Flatten and save
    final_wave = np.concatenate(output_audio)
    output_filename = "reconstruction_output.mp3"

    # sf.write can save to mp3 if the appropriate library is present,
    # but wav is safer for raw sf. Forcing mp3 via soundfile:
    sf.write(output_filename, final_wave, SAMPLE_RATE)

    print(f"Processing complete! Downloading {output_filename}...")
    files.download(output_filename)

# --- 4. EXECUTION ---
if __name__ == "__main__":
    # Download weights automatically
    net = setup_model()

    # User uploads audio
    print("\nPlease upload the audio file you wish to process:")
    uploaded = files.upload()
    if uploaded:
        audio_file = list(uploaded.keys())[0]
        process_audio(net, audio_file)

5. License

Licensed under Apache 2.0.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support