TheArtist Music Transformer — F3 (Pop 2.5K Mix) — balanced sweet spot

Jazz-adapted chord model with a 2,500-sequence pop rehearsal buffer (≈1.65× the jazz training volume). Pop top-1 preserved within 0.04 points of the Phase 0 baseline. Jazz top-1 +8.13 points. The paper's recommended balanced default.

One of six checkpoints released alongside the paper Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation (Lee, 2026). See the collection overview at PearlLeeStudio/TheArtist-MusicTransformer-pop-baseline.

Model summary

Field	Value
Architecture	Music Transformer with relative positional attention
Parameters	25,661,440
Vocabulary size	351 tokens
Max sequence length	256
d_model / heads / FFN / layers	512 / 8 / 2048 / 8
Fine-tune resumed from	Phase 0 pop baseline
Best epoch	9

Training data

All 1,513 jazz training sequences plus 2,500 pop rehearsal sequences (seed 42). Pop:jazz ≈ 1.65:1. The paper identifies this ratio as the minimum rehearsal volume that fully preserves pop fluency while delivering essentially the full jazz gain (paper §7.1).

Evaluation (held-out per-genre test sets)

Metric	Pop test	Jazz test
Top-1 accuracy	84.20%	80.99%
Top-5 accuracy	96.87%	92.63%
Perplexity	1.82	2.29
Δ vs. Phase 0 baseline	−0.04	+8.13

Qualitative samples from F3 introduce secondary dominants, chromatic passing diminished chords, and other jazz voice-leading vocabulary that the Phase 0 baseline does not produce. See paper §6.4 for representative continuations.

Intended use

Recommended default for chord-composition workflows that need fluency in both pop and jazz registers. F1 (ft-pop80) and F4 (ft-pop29) are the stylistic endpoints when a more committed pop-leaning or jazz-leaning identity is desired.

Out of scope: melody or audio generation; genres outside pop, rock, and jazz; real-time low-latency settings.

Usage

import torch
from huggingface_hub import hf_hub_download
from model import MusicTransformer
from tokenizer import ChordTokenizer

ckpt_path = hf_hub_download(
    repo_id="PearlLeeStudio/TheArtist-MusicTransformer-ft-pop50",
    filename="best.pt",
)
tokenizer = ChordTokenizer()
ckpt = torch.load(ckpt_path, map_location="cpu", weights_only=False)

model = MusicTransformer(
    vocab_size=tokenizer.vocab_size,
    d_model=512, n_heads=8, d_ff=2048, n_layers=8,
    max_seq_len=256, dropout=0.0, pad_id=tokenizer.pad_id,
)
model.load_state_dict(ckpt["model_state_dict"])
model.eval()

# Prompt = jazz ii-V-I in C major
song = {
    "key": "Cmaj",
    "time_signature": "4/4",
    "genre": "jazz",
    "bars": [["Dm7", "G7"], ["Cmaj7"]],
}
prompt_ids = tokenizer.encode_sequence(song)[:-1]
ids = torch.tensor([prompt_ids])
with torch.no_grad():
    for _ in range(32):
        logits = model(ids)
        next_id = torch.multinomial(
            torch.softmax(logits[:, -1, :] / 0.8, dim=-1), 1
        )
        ids = torch.cat([ids, next_id], dim=-1)
        if next_id.item() == tokenizer.eos_id:
            break
print(tokenizer.decode(ids[0].tolist()))

Training-data licenses

Dataset	License
Chordonomicon	Public (user-generated)
McGill Billboard	CC0
Jazz Harmony Treebank	Public
JazzStandards (iReal Pro)	Community redistribution
Weimar Jazz Database	ODbL
JAAH	Research-use public

Citation

Preprint: arXiv:2605.04998.

@misc{lee2026chordmix,
  title         = {Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation},
  author        = {Lee, Jinju},
  year          = {2026},
  eprint        = {2605.04998},
  archivePrefix = {arXiv}
}

Downloads last month: 13

Paper for PearlLeeStudio/TheArtist-MusicTransformer-ft-pop50

Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation

Paper • 2605.04998 • Published 1 day ago