Parallax_AUDIO-geometrydash
Parallax_AUDIO-geometrydash is an experimental neural audio autoencoder that redefines how high-fidelity signal reconstruction is handled. Instead of relying on traditional amplitude-matching, this model prioritizes the internal geometry and relational topology of a waveform. It is designed to prove that the "soul" and transient sharpness of audio can be preserved even through a massive information bottleneck.
1. Technical Specs
- Total Parameters: ~450,000 (0.45M)
- Input Format: 1.0s Mono Raw Audio @ 44,100Hz
- Bottleneck Ratio: 96:1
- Architecture: 5-Stage 1D-Convolutional Relational Autoencoder
- Training Data: 1.5+ hours of raw Geometry Dash gameplay audio (clicks, transitions, and rhythmic synchronization).
2. The Facts: The Geometry of Sound
This model operates at a 96:1 compression ratio, squeezing 44,100 samples into just 460 latent values.
In audio terms, this level of compression usually results in "underwater" artifacts or muffled transients. [cite_start]However, by focusing on the consistency of distances between samples, Parallax_AUDIO ensures that the "clack" of a player's click and the sharp transitions of Geometry Dash levels remain crisp. [cite: 1, 14] [cite_start]It avoids the "blurring" effect common in high-compression bottlenecks by enforcing sharp relational gradients during the reconstruction of the wave. [cite: 14]
| Metric | Value |
|---|---|
| Input Samples (1s) | 44,100 |
| Latent Dimension | ~460 |
| Compression Ratio | 96 : 1 |
| Reconstruction Clarity | Structural Integrity Optimized |
3. The Story: Beyond "Pixel-Matching" for Waves
[cite_start]Most audio models fail at high ratios because they treat samples as isolated points. [cite: 3] [cite_start]If a reconstructed wave is shifted by just one millisecond, traditional loss functions fail, even if the sound is identical to the human ear. [cite: 2]
[cite_start]Parallax_AUDIO-geometrydash succeeds by shifting the objective from "What is the value of this sample?" to "How does this sample relate to its neighbors?" [cite: 17] [cite_start]By utilizing a multi-scale approach that connects immediate neighbors to distant rhythmic points, the model captures both the local topology (high-frequency clarity) and the global structure (bass and rhythm). [cite: 9, 13]
It is a high-fidelity reconstruction engine that treats audio like a geometric landscape.
4. Usage (Inference Script)
This script demonstrates the reconstruction capabilities of the 1D-CNN architecture on raw 44.1kHz audio.
import torch
import torch.nn as nn
import librosa
import numpy as np
import soundfile as sf
import os
import requests
from google.colab import files
# --- CONFIGURATION ---
# Replace this URL with the direct download link for your model.pt
MODEL_URL = "https://huggingface.co/Parallax-labs-1/parallax_AUDIO-geometrydash/resolve/main/model.pt"
SAMPLE_RATE = 44100
WINDOW_SIZE = 4096
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# --- 1. MODEL ARCHITECTURE ---
class RawAudioAE(nn.Module):
def __init__(self):
super().__init__()
self.encoder = nn.Sequential(
nn.Conv1d(1, 32, kernel_size=15, stride=4, padding=7),
nn.LeakyReLU(0.2),
nn.Conv1d(32, 64, kernel_size=7, stride=4, padding=3),
nn.LeakyReLU(0.2),
nn.Conv1d(64, 128, kernel_size=7, stride=3, padding=3),
nn.LeakyReLU(0.2),
nn.Conv1d(128, 256, kernel_size=3, stride=2, padding=1),
nn.LeakyReLU(0.2)
)
self.decoder = nn.Sequential(
nn.ConvTranspose1d(256, 128, kernel_size=3, stride=2, padding=1, output_padding=0),
nn.LeakyReLU(0.2),
nn.ConvTranspose1d(128, 64, kernel_size=7, stride=3, padding=3, output_padding=0),
nn.LeakyReLU(0.2),
nn.ConvTranspose1d(64, 32, kernel_size=7, stride=4, padding=3, output_padding=3),
nn.LeakyReLU(0.2),
nn.ConvTranspose1d(32, 1, kernel_size=15, stride=4, padding=7, output_padding=3),
nn.Tanh()
)
def forward(self, x):
return self.decoder(self.encoder(x))
# --- 2. DOWNLOAD & LOAD MODEL ---
def setup_model():
if not os.path.exists('model.pt'):
print("Downloading model.pt from Hugging Face...")
response = requests.get(MODEL_URL, stream=True)
with open('model.pt', 'wb') as f:
for chunk in response.iter_content(chunk_size=8192):
f.write(chunk)
model = RawAudioAE().to(DEVICE)
model.load_state_dict(torch.load('model.pt', map_location=DEVICE))
model.eval()
print("Model loaded and ready.")
return model
# --- 3. PROCESSING ---
def process_audio(model, input_path):
print(f"Loading {input_path}...")
y, _ = librosa.load(input_path, sr=SAMPLE_RATE, mono=True)
# Calculate chunks
num_chunks = len(y) // WINDOW_SIZE
output_audio = []
print(f"Reconstructing {num_chunks} segments...")
with torch.no_grad():
for i in range(num_chunks):
start = i * WINDOW_SIZE
end = start + WINDOW_SIZE
chunk = torch.from_numpy(y[start:end]).float().view(1, 1, -1).to(DEVICE)
reconstructed = model(chunk)
output_audio.append(reconstructed.squeeze().cpu().numpy())
# Flatten and save
final_wave = np.concatenate(output_audio)
output_filename = "reconstruction_output.mp3"
# sf.write can save to mp3 if the appropriate library is present,
# but wav is safer for raw sf. Forcing mp3 via soundfile:
sf.write(output_filename, final_wave, SAMPLE_RATE)
print(f"Processing complete! Downloading {output_filename}...")
files.download(output_filename)
# --- 4. EXECUTION ---
if __name__ == "__main__":
# Download weights automatically
net = setup_model()
# User uploads audio
print("\nPlease upload the audio file you wish to process:")
uploaded = files.upload()
if uploaded:
audio_file = list(uploaded.keys())[0]
process_audio(net, audio_file)
5. License
Licensed under Apache 2.0.