TheArtist Music Transformer โ F4 (Pop 1K Mix, jazz-leaning)
Jazz-adapted chord model with a 1,000-sequence pop rehearsal buffer. The jazz-leaning endpoint of the mix-ratio sweep. Highest jazz top-1 in the collection (81.50%) at the cost of 1.22 pop points.
One of six checkpoints released alongside the paper Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation (Lee, 2026). See the collection overview at PearlLeeStudio/TheArtist-MusicTransformer-pop-baseline.
Model summary
| Field | Value |
|---|---|
| Architecture | Music Transformer with relative positional attention |
| Parameters | 25,661,440 |
| Vocabulary size | 351 tokens |
| Max sequence length | 256 |
| d_model / heads / FFN / layers | 512 / 8 / 2048 / 8 |
| Fine-tune resumed from | Phase 0 pop baseline |
| Best epoch | 6 |
Training data
All 1,513 jazz training sequences plus 1,000 pop rehearsal sequences (seed 42). Pop:jazz โ 0.66:1, that is, less pop than jazz in the mix.
Evaluation (held-out per-genre test sets)
| Metric | Pop test | Jazz test |
|---|---|---|
| Top-1 accuracy | 83.02% | 81.50% |
| Top-5 accuracy | 96.93% | 92.59% |
| Perplexity | 1.81 | 2.26 |
| ฮ vs. Phase 0 baseline | โ1.22 | +8.64 |
F4 is the jazz-leaning endpoint of the mix-ratio sweep. It produces the most jazz-flavoured continuations among the released checkpoints, with secondary dominants, tritone substitutions, modal interchange, and II-V chains across distant keys. The cost is roughly one point of pop top-1 accuracy. Qualitative samples (paper ยง6.4) on a minor ii-V prompt show the bebop-style harmonic motion that this checkpoint commits to more strongly than F3.
Intended use
Recommended for jazz-flavoured chord composition where the user is willing to trade some pop fluency for stronger jazz identity. F3 (ft-pop50) is the balanced alternative; F1 (ft-pop80) is the symmetric pop-leaning endpoint.
Out of scope: melody or audio generation; genres outside pop, rock, and jazz; real-time low-latency settings.
Usage
import torch
from huggingface_hub import hf_hub_download
from model import MusicTransformer
from tokenizer import ChordTokenizer
ckpt_path = hf_hub_download(
repo_id="PearlLeeStudio/TheArtist-MusicTransformer-ft-pop29",
filename="best.pt",
)
tokenizer = ChordTokenizer()
ckpt = torch.load(ckpt_path, map_location="cpu", weights_only=False)
model = MusicTransformer(
vocab_size=tokenizer.vocab_size,
d_model=512, n_heads=8, d_ff=2048, n_layers=8,
max_seq_len=256, dropout=0.0, pad_id=tokenizer.pad_id,
)
model.load_state_dict(ckpt["model_state_dict"])
model.eval()
Training-data licenses
| Dataset | License |
|---|---|
| Chordonomicon | Public (user-generated) |
| McGill Billboard | CC0 |
| Jazz Harmony Treebank | Public |
| JazzStandards (iReal Pro) | Community redistribution |
| Weimar Jazz Database | ODbL |
| JAAH | Research-use public |
Citation
Preprint: arXiv:2605.04998.
@misc{lee2026chordmix,
title = {Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation},
author = {Lee, Jinju},
year = {2026},
eprint = {2605.04998},
archivePrefix = {arXiv}
}
- Downloads last month
- 14