Got SongGeneration-v2-large running on 16GB VRAM and 32GB System RAM
#11
by Siriusquirrel - opened
Hi everyone,
I managed to get the SongGeneration v2 Large model running on consumer hardware (tested on RX9070 with 16GB VRAM / 32GB System RAM) and successfully generated full-length tracks of up to 280 seconds by optimizing the inference pipeline:
- Sequential Loading: Conditioner -> Transformer -> Diffusor/VAE to save memory.
- FP16 Weights: Lossless conversion (13G -> 9.5G) with zero impact on SNR.
- µ-law KV-Cache: Uses int8 µ-law encoding with layer-wise scaling for the 36+12 transformer layers (~1% non-cumulative error, indistinguishable after diffusion).
Check out the code here: https://github.com/Siriusquirrel/SongGeneration
I revamped requirements.txt and included conversion scripts for all the models.
Looking for feedback on different GPUs and SNR impressions!