TheArtist Music Transformer β F3 (Pop 2.5K Mix) β balanced sweet spot
Jazz-adapted chord model with a 2,500-sequence pop rehearsal buffer (β1.65Γ the jazz training volume). Pop top-1 preserved within 0.04 points of the Phase 0 baseline. Jazz top-1 +8.13 points. The paper's recommended balanced default.
One of six checkpoints released alongside the paper Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation (Lee, 2026). See the collection overview at PearlLeeStudio/TheArtist-MusicTransformer-pop-baseline.
Model summary
| Field | Value |
|---|---|
| Architecture | Music Transformer with relative positional attention |
| Parameters | 25,661,440 |
| Vocabulary size | 351 tokens |
| Max sequence length | 256 |
| d_model / heads / FFN / layers | 512 / 8 / 2048 / 8 |
| Fine-tune resumed from | Phase 0 pop baseline |
| Best epoch | 9 |
Training data
All 1,513 jazz training sequences plus 2,500 pop rehearsal sequences (seed 42). Pop:jazz β 1.65:1. The paper identifies this ratio as the minimum rehearsal volume that fully preserves pop fluency while delivering essentially the full jazz gain (paper Β§7.1).
Evaluation (held-out per-genre test sets)
| Metric | Pop test | Jazz test |
|---|---|---|
| Top-1 accuracy | 84.20% | 80.99% |
| Top-5 accuracy | 96.87% | 92.63% |
| Perplexity | 1.82 | 2.29 |
| Ξ vs. Phase 0 baseline | β0.04 | +8.13 |
Qualitative samples from F3 introduce secondary dominants, chromatic passing diminished chords, and other jazz voice-leading vocabulary that the Phase 0 baseline does not produce. See paper Β§6.4 for representative continuations.
Intended use
Recommended default for chord-composition workflows that need fluency in both pop and jazz registers. F1 (ft-pop80) and F4 (ft-pop29) are the stylistic endpoints when a more committed pop-leaning or jazz-leaning identity is desired.
Out of scope: melody or audio generation; genres outside pop, rock, and jazz; real-time low-latency settings.
Usage
import torch
from huggingface_hub import hf_hub_download
from model import MusicTransformer
from tokenizer import ChordTokenizer
ckpt_path = hf_hub_download(
repo_id="PearlLeeStudio/TheArtist-MusicTransformer-ft-pop50",
filename="best.pt",
)
tokenizer = ChordTokenizer()
ckpt = torch.load(ckpt_path, map_location="cpu", weights_only=False)
model = MusicTransformer(
vocab_size=tokenizer.vocab_size,
d_model=512, n_heads=8, d_ff=2048, n_layers=8,
max_seq_len=256, dropout=0.0, pad_id=tokenizer.pad_id,
)
model.load_state_dict(ckpt["model_state_dict"])
model.eval()
# Prompt = jazz ii-V-I in C major
song = {
"key": "Cmaj",
"time_signature": "4/4",
"genre": "jazz",
"bars": [["Dm7", "G7"], ["Cmaj7"]],
}
prompt_ids = tokenizer.encode_sequence(song)[:-1]
ids = torch.tensor([prompt_ids])
with torch.no_grad():
for _ in range(32):
logits = model(ids)
next_id = torch.multinomial(
torch.softmax(logits[:, -1, :] / 0.8, dim=-1), 1
)
ids = torch.cat([ids, next_id], dim=-1)
if next_id.item() == tokenizer.eos_id:
break
print(tokenizer.decode(ids[0].tolist()))
Training-data licenses
| Dataset | License |
|---|---|
| Chordonomicon | Public (user-generated) |
| McGill Billboard | CC0 |
| Jazz Harmony Treebank | Public |
| JazzStandards (iReal Pro) | Community redistribution |
| Weimar Jazz Database | ODbL |
| JAAH | Research-use public |
Citation
Preprint: arXiv:2605.04998.
@misc{lee2026chordmix,
title = {Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation},
author = {Lee, Jinju},
year = {2026},
eprint = {2605.04998},
archivePrefix = {arXiv}
}
- Downloads last month
- 13