File size: 2,880 Bytes
be52208
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1af19a8
 
 
be52208
 
 
 
 
ee51323
3472587
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8869582
 
 
 
 
 
 
 
 
 
 
 
 
 
 
be52208
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
---
license: apache-2.0
tags:
  - music
  - audio
  - popularity-prediction
  - aesthetic-quality
  - multi-task-learning
  - mert
  - ai-generated-music
  - suno
  - udio
language:
  - en
library_name: transformers
---

# APEX: Large-Scale Multi-Task Aesthetic-Informed Popularity Prediction for AI-Generated Music

APEX is the first large-scale multi-task learning framework for jointly predicting **popularity** and **aesthetic quality** of AI-generated music from audio alone. It is trained on over 211k AI-generated songs (~10k hours of audio) from Suno and Udio, leveraging [MERT-v1-95M](https://huggingface.co/m-a-p/MERT-v1-95M) audio embeddings.

---

## What does APEX predict?

Given any audio file, APEX predicts 7 scores:

**Popularity:**
| Score | Range | Description |
|---|---|---|
| `score_streams` | 0–100 | Predicted streaming engagement score |
| `score_likes` | 0–100 | Predicted likes engagement score |

**Aesthetic Quality (from [SongEval](https://github.com/ASLP-lab/SongEval)):**
| Score | Range | Description |
|---|---|---|
| `coherence` | 1–5 | Structural and harmonic coherence |
| `musicality` | 1–5 | Overall musical quality |
| `memorability` | 1–5 | How memorable the song is |
| `clarity` | 1–5 | Clarity of production and mix |
| `naturalness` | 1–5 | Naturalness of the generated audio |

---

## Architecture

![APEX Architecture](architecture.png)

---

## Usage

### Installation

```bash
pip uninstall -y torch torchvision torchaudio transformers -q
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0
pip install transformers soundfile librosa "numpy<2" "scipy<1.16"
```

### Inference

```python
from transformers import AutoModel
import torch

model = AutoModel.from_pretrained(
    "amaai-lab/apex",
    trust_remote_code       = True,
    device_map              = None,
    low_cpu_mem_usage       = False,
    ignore_mismatched_sizes = True
)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model  = model.to(device)

results = model.predict("/path/to/your/mp3/file", save_json="results.json")

print(f"Streams Score : {results['score_streams']:.2f}")
print(f"Likes Score   : {results['score_likes']:.2f}")
print(f"Coherence     : {results['coherence']:.2f}")
print(f"Musicality    : {results['musicality']:.2f}")
print(f"Memorability  : {results['memorability']:.2f}")
print(f"Clarity       : {results['clarity']:.2f}")
print(f"Naturalness   : {results['naturalness']:.2f}")
```
---

## Citation

```bash
@misc{husain2026apexlargescalemultitaskaestheticinformed,
      title={APEX: Large-scale Multi-task Aesthetic-Informed Popularity Prediction for AI-Generated Music}, 
      author={Jaavid Aktar Husain and Dorien Herremans},
      year={2026},
      eprint={2605.03395},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2605.03395}, 
}
```