| --- |
| license: apache-2.0 |
| tags: |
| - music |
| - audio |
| - popularity-prediction |
| - aesthetic-quality |
| - multi-task-learning |
| - mert |
| - ai-generated-music |
| - suno |
| - udio |
| language: |
| - en |
| library_name: transformers |
| --- |
| |
| # APEX: Large-Scale Multi-Task Aesthetic-Informed Popularity Prediction for AI-Generated Music |
|
|
| APEX is the first large-scale multi-task learning framework for jointly predicting **popularity** and **aesthetic quality** of AI-generated music from audio alone. It is trained on over 211k AI-generated songs (~10k hours of audio) from Suno and Udio, leveraging [MERT-v1-95M](https://huggingface.co/m-a-p/MERT-v1-95M) audio embeddings. |
|
|
| --- |
|
|
| ## What does APEX predict? |
|
|
| Given any audio file, APEX predicts 7 scores: |
|
|
| **Popularity:** |
| | Score | Range | Description | |
| |---|---|---| |
| | `score_streams` | 0β100 | Predicted streaming engagement score | |
| | `score_likes` | 0β100 | Predicted likes engagement score | |
|
|
| **Aesthetic Quality (from [SongEval](https://github.com/ASLP-lab/SongEval)):** |
| | Score | Range | Description | |
| |---|---|---| |
| | `coherence` | 1β5 | Structural and harmonic coherence | |
| | `musicality` | 1β5 | Overall musical quality | |
| | `memorability` | 1β5 | How memorable the song is | |
| | `clarity` | 1β5 | Clarity of production and mix | |
| | `naturalness` | 1β5 | Naturalness of the generated audio | |
|
|
| --- |
|
|
| ## Architecture |
|
|
|  |
|
|
| --- |
|
|
| ## Usage |
|
|
| ### Installation |
|
|
| ```bash |
| pip uninstall -y torch torchvision torchaudio transformers -q |
| pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 |
| pip install transformers soundfile librosa "numpy<2" "scipy<1.16" |
| ``` |
|
|
| ### Inference |
|
|
| ```python |
| from transformers import AutoModel |
| import torch |
| |
| model = AutoModel.from_pretrained( |
| "amaai-lab/apex", |
| trust_remote_code = True, |
| device_map = None, |
| low_cpu_mem_usage = False, |
| ignore_mismatched_sizes = True |
| ) |
| |
| device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
| model = model.to(device) |
| |
| results = model.predict("/path/to/your/mp3/file", save_json="results.json") |
| |
| print(f"Streams Score : {results['score_streams']:.2f}") |
| print(f"Likes Score : {results['score_likes']:.2f}") |
| print(f"Coherence : {results['coherence']:.2f}") |
| print(f"Musicality : {results['musicality']:.2f}") |
| print(f"Memorability : {results['memorability']:.2f}") |
| print(f"Clarity : {results['clarity']:.2f}") |
| print(f"Naturalness : {results['naturalness']:.2f}") |
| ``` |
| --- |
|
|
| ## Citation |
|
|
| ```bash |
| @misc{husain2026apexlargescalemultitaskaestheticinformed, |
| title={APEX: Large-scale Multi-task Aesthetic-Informed Popularity Prediction for AI-Generated Music}, |
| author={Jaavid Aktar Husain and Dorien Herremans}, |
| year={2026}, |
| eprint={2605.03395}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.SD}, |
| url={https://arxiv.org/abs/2605.03395}, |
| } |
| ``` |