earlab
/

EAR_VAE

@@ -17,6 +17,8 @@ tags:
 This repository contains the official inference code for εar-VAE, aa 44.1 kHz music signal reconstruction model that rethinks and optimizes VAE training for audio. It targets two common weaknesses in existing open-source VAEs—phase accuracy and stereophonic spatial representation—by aligning objectives with auditory perception and introducing phase-aware training. Experiments show substantial improvements across diverse metrics, with particular strength in high-frequency harmonics and spatial characteristics.
 Why εar-VAE:
 - 🎧 Perceptual alignment: A K-weighting perceptual filter is applied before loss computation to better match human hearing.
 - 🔁 Phase-aware objectives: Two novel phase losses

 This repository contains the official inference code for εar-VAE, aa 44.1 kHz music signal reconstruction model that rethinks and optimizes VAE training for audio. It targets two common weaknesses in existing open-source VAEs—phase accuracy and stereophonic spatial representation—by aligning objectives with auditory perception and introducing phase-aware training. Experiments show substantial improvements across diverse metrics, with particular strength in high-frequency harmonics and spatial characteristics.
+> ⭐2025-12-10 Update⭐: a new model weight works in 48kHz sample rate, same-level vocal performance with better stereophonic energy reconstruction.
 Why εar-VAE:
 - 🎧 Perceptual alignment: A K-weighting perceptual filter is applied before loss computation to better match human hearing.
 - 🔁 Phase-aware objectives: Two novel phase losses