TheArtist Music Transformer โ€” F4 (Pop 1K Mix, jazz-leaning)

Jazz-adapted chord model with a 1,000-sequence pop rehearsal buffer. The jazz-leaning endpoint of the mix-ratio sweep. Highest jazz top-1 in the collection (81.50%) at the cost of 1.22 pop points.

One of six checkpoints released alongside the paper Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation (Lee, 2026). See the collection overview at PearlLeeStudio/TheArtist-MusicTransformer-pop-baseline.

Model summary

Field Value
Architecture Music Transformer with relative positional attention
Parameters 25,661,440
Vocabulary size 351 tokens
Max sequence length 256
d_model / heads / FFN / layers 512 / 8 / 2048 / 8
Fine-tune resumed from Phase 0 pop baseline
Best epoch 6

Training data

All 1,513 jazz training sequences plus 1,000 pop rehearsal sequences (seed 42). Pop:jazz โ‰ˆ 0.66:1, that is, less pop than jazz in the mix.

Evaluation (held-out per-genre test sets)

Metric Pop test Jazz test
Top-1 accuracy 83.02% 81.50%
Top-5 accuracy 96.93% 92.59%
Perplexity 1.81 2.26
ฮ” vs. Phase 0 baseline โˆ’1.22 +8.64

F4 is the jazz-leaning endpoint of the mix-ratio sweep. It produces the most jazz-flavoured continuations among the released checkpoints, with secondary dominants, tritone substitutions, modal interchange, and II-V chains across distant keys. The cost is roughly one point of pop top-1 accuracy. Qualitative samples (paper ยง6.4) on a minor ii-V prompt show the bebop-style harmonic motion that this checkpoint commits to more strongly than F3.

Intended use

Recommended for jazz-flavoured chord composition where the user is willing to trade some pop fluency for stronger jazz identity. F3 (ft-pop50) is the balanced alternative; F1 (ft-pop80) is the symmetric pop-leaning endpoint.

Out of scope: melody or audio generation; genres outside pop, rock, and jazz; real-time low-latency settings.

Usage

import torch
from huggingface_hub import hf_hub_download
from model import MusicTransformer
from tokenizer import ChordTokenizer

ckpt_path = hf_hub_download(
    repo_id="PearlLeeStudio/TheArtist-MusicTransformer-ft-pop29",
    filename="best.pt",
)
tokenizer = ChordTokenizer()
ckpt = torch.load(ckpt_path, map_location="cpu", weights_only=False)

model = MusicTransformer(
    vocab_size=tokenizer.vocab_size,
    d_model=512, n_heads=8, d_ff=2048, n_layers=8,
    max_seq_len=256, dropout=0.0, pad_id=tokenizer.pad_id,
)
model.load_state_dict(ckpt["model_state_dict"])
model.eval()

Training-data licenses

Dataset License
Chordonomicon Public (user-generated)
McGill Billboard CC0
Jazz Harmony Treebank Public
JazzStandards (iReal Pro) Community redistribution
Weimar Jazz Database ODbL
JAAH Research-use public

Citation

Preprint: arXiv:2605.04998.

@misc{lee2026chordmix,
  title         = {Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation},
  author        = {Lee, Jinju},
  year          = {2026},
  eprint        = {2605.04998},
  archivePrefix = {arXiv}
}
Downloads last month
14
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Paper for PearlLeeStudio/TheArtist-MusicTransformer-ft-pop29