Got SongGeneration-v2-large running on 16GB VRAM and 32GB System RAM

#11
by Siriusquirrel - opened

Hi everyone,

I managed to get the SongGeneration v2 Large model running on consumer hardware (tested on RX9070 with 16GB VRAM / 32GB System RAM) and successfully generated full-length tracks of up to 280 seconds by optimizing the inference pipeline:

  • Sequential Loading: Conditioner -> Transformer -> Diffusor/VAE to save memory.
  • FP16 Weights: Lossless conversion (13G -> 9.5G) with zero impact on SNR.
  • µ-law KV-Cache: Uses int8 µ-law encoding with layer-wise scaling for the 36+12 transformer layers (~1% non-cumulative error, indistinguishable after diffusion).

Check out the code here: https://github.com/Siriusquirrel/SongGeneration
I revamped requirements.txt and included conversion scripts for all the models.

Looking for feedback on different GPUs and SNR impressions!

Sign up or log in to comment