Piano Source Separation Model

This repository contains a 17 MB piano separation model and inference script for running it.

The model takes an audio track as input and outputs the isolated piano.

Examples

Supported input formats: wav, flac, mp3
Supported output formats: wav, flac (--output_format wav / --output_format flac)
--input_dir can point to either a single file or a directory containing multiple files

pip install torch einops rotary-embedding-torch numpy soundfile safetensors

Download the inference.py file then run the code below after setting the --input_dir (model and config will be auto-downloaded).

python inference.py --input_dir 'Insert path to file or directory containing file(s) here'

--output_dir to choose where the outputs are saved, default is the same as --input_dir (output filenames will have _piano at the end)
--checkpoint_path where the model is located, if not found the code will automatically download it
--config_path where the config.json is located, if not found the code will automatically download it

This model is trained for the typical common piano only, it will not work on variants such as the electric piano.
Uses GPU (3GB VRAM required) automatically if available, CPU is used otherwise
The model is trained with 44.1 kHz audio
Processing speed of ~1 second per 1 minute of audio on a google colab T4.

Please cite this repository if you use this model in research or a project.

Wei-Tsung Lu, Ju-Chiang Wang, Qiuqiang Kong, Yun-Ning Hung - https://arxiv.org/abs/2309.02612 lucidrains - https://github.com/lucidrains/BS-RoFormer

train-loss