TheArtist Music Transformer โ F1 (Pop 10K Mix, pop-leaning)
Jazz-adapted chord model with a 10,000-sequence pop rehearsal buffer. The pop-leaning endpoint of the mix-ratio sweep. Pop accuracy actually improves on the pre-fine-tune baseline; jazz reaches +8.17 points.
One of six checkpoints released alongside the paper Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation (Lee, 2026). See the paper for full context and the collection overview at PearlLeeStudio/TheArtist-MusicTransformer-pop-baseline.
Model summary
| Field | Value |
|---|---|
| Architecture | Music Transformer with relative positional attention |
| Parameters | 25,661,440 |
| Vocabulary size | 351 tokens |
| Max sequence length | 256 |
| d_model / heads / FFN / layers | 512 / 8 / 2048 / 8 |
| Fine-tune resumed from | Phase 0 pop baseline |
| Best epoch | 6 |
Training data
All 1,513 jazz training sequences (Jazz Harmony Treebank, JazzStandards, Weimar Jazz Database, JAAH) plus 10,000 pop rehearsal sequences sub-sampled with seed 42 from the Phase 0 pop training split. Pop:jazz โ 6.6:1 in the mix.
Fine-tune hyperparameters: peak learning rate 2 ร 10โปโต, two-epoch warmup, ten epochs maximum with patience 5.
Evaluation (held-out per-genre test sets)
| Metric | Pop test | Jazz test |
|---|---|---|
| Top-1 accuracy | 84.60% | 81.03% |
| Top-5 accuracy | 96.96% | 92.41% |
| Perplexity | 1.78 | 2.31 |
| ฮ vs. Phase 0 baseline | +0.36 | +8.17 |
This is the only run in the sweep whose pop top-1 exceeds the Phase 0 baseline. It is also the run with the most stable pop curve over training. Choose F1 when pop fluency is a hard constraint and jazz coloration is welcome but not the primary target. Generations stay rooted in commercial pop and rock harmony, with jazz substitutions appearing selectively (an occasional secondary dominant or ii-V detour inside an otherwise diatonic loop).
Intended use and limitations
Recommended for chord-composition workflows targeting pop, rock, CCM, K-pop, J-pop, and modern country with optional jazz coloration. F4 (ft-pop29) is the symmetric jazz-leaning endpoint; F3 (ft-pop50) is the balanced middle.
Out of scope: melody or audio generation; genres outside pop, rock, and jazz; real-time low-latency settings.
Usage
import torch
from huggingface_hub import hf_hub_download
from model import MusicTransformer
from tokenizer import ChordTokenizer
ckpt_path = hf_hub_download(
repo_id="PearlLeeStudio/TheArtist-MusicTransformer-ft-pop80",
filename="best.pt",
)
tokenizer = ChordTokenizer()
ckpt = torch.load(ckpt_path, map_location="cpu", weights_only=False)
model = MusicTransformer(
vocab_size=tokenizer.vocab_size,
d_model=512, n_heads=8, d_ff=2048, n_layers=8,
max_seq_len=256, dropout=0.0, pad_id=tokenizer.pad_id,
)
model.load_state_dict(ckpt["model_state_dict"])
model.eval()
# Prompt = ii-V-I in C major
song = {
"key": "Cmaj",
"time_signature": "4/4",
"genre": "pop",
"bars": [["Dm7", "G7"], ["Cmaj7"]],
}
prompt_ids = tokenizer.encode_sequence(song)[:-1]
ids = torch.tensor([prompt_ids])
with torch.no_grad():
for _ in range(32):
logits = model(ids)
next_id = torch.multinomial(
torch.softmax(logits[:, -1, :] / 0.8, dim=-1), 1
)
ids = torch.cat([ids, next_id], dim=-1)
if next_id.item() == tokenizer.eos_id:
break
print(tokenizer.decode(ids[0].tolist()))
Training-data licenses
| Dataset | License |
|---|---|
| Chordonomicon | Public (user-generated) |
| McGill Billboard | CC0 |
| Jazz Harmony Treebank | Public |
| JazzStandards (iReal Pro) | Community redistribution |
| Weimar Jazz Database | ODbL |
| JAAH | Research-use public |
Citation
Preprint: arXiv:2605.04998.
@misc{lee2026chordmix,
title = {Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation},
author = {Lee, Jinju},
year = {2026},
eprint = {2605.04998},
archivePrefix = {arXiv}
}
- Downloads last month
- 11