--- license: apache-2.0 tags: - music - audio - popularity-prediction - aesthetic-quality - multi-task-learning - mert - ai-generated-music - suno - udio language: - en library_name: transformers --- # APEX: Large-Scale Multi-Task Aesthetic-Informed Popularity Prediction for AI-Generated Music APEX is the first large-scale multi-task learning framework for jointly predicting **popularity** and **aesthetic quality** of AI-generated music from audio alone. It is trained on over 211k AI-generated songs (~10k hours of audio) from Suno and Udio, leveraging [MERT-v1-95M](https://huggingface.co/m-a-p/MERT-v1-95M) audio embeddings. --- ## What does APEX predict? Given any audio file, APEX predicts 7 scores: **Popularity:** | Score | Range | Description | |---|---|---| | `score_streams` | 0–100 | Predicted streaming engagement score | | `score_likes` | 0–100 | Predicted likes engagement score | **Aesthetic Quality (from [SongEval](https://github.com/ASLP-lab/SongEval)):** | Score | Range | Description | |---|---|---| | `coherence` | 1–5 | Structural and harmonic coherence | | `musicality` | 1–5 | Overall musical quality | | `memorability` | 1–5 | How memorable the song is | | `clarity` | 1–5 | Clarity of production and mix | | `naturalness` | 1–5 | Naturalness of the generated audio | --- ## Architecture ![APEX Architecture](architecture.png) --- ## Usage ### Installation ```bash pip uninstall -y torch torchvision torchaudio transformers -q pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 pip install transformers soundfile librosa "numpy<2" "scipy<1.16" ``` ### Inference ```python from transformers import AutoModel import torch model = AutoModel.from_pretrained( "amaai-lab/apex", trust_remote_code = True, device_map = None, low_cpu_mem_usage = False, ignore_mismatched_sizes = True ) device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model = model.to(device) results = model.predict("/path/to/your/mp3/file", save_json="results.json") print(f"Streams Score : {results['score_streams']:.2f}") print(f"Likes Score : {results['score_likes']:.2f}") print(f"Coherence : {results['coherence']:.2f}") print(f"Musicality : {results['musicality']:.2f}") print(f"Memorability : {results['memorability']:.2f}") print(f"Clarity : {results['clarity']:.2f}") print(f"Naturalness : {results['naturalness']:.2f}") ``` --- ## Citation ```bash @misc{husain2026apexlargescalemultitaskaestheticinformed, title={APEX: Large-scale Multi-task Aesthetic-Informed Popularity Prediction for AI-Generated Music}, author={Jaavid Aktar Husain and Dorien Herremans}, year={2026}, eprint={2605.03395}, archivePrefix={arXiv}, primaryClass={cs.SD}, url={https://arxiv.org/abs/2605.03395}, } ```