TheArtist Music Transformer β€” F3 (Pop 2.5K Mix) β€” balanced sweet spot

Jazz-adapted chord model with a 2,500-sequence pop rehearsal buffer (β‰ˆ1.65Γ— the jazz training volume). Pop top-1 preserved within 0.04 points of the Phase 0 baseline. Jazz top-1 +8.13 points. The paper's recommended balanced default.

One of six checkpoints released alongside the paper Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation (Lee, 2026). See the collection overview at PearlLeeStudio/TheArtist-MusicTransformer-pop-baseline.

Model summary

Field Value
Architecture Music Transformer with relative positional attention
Parameters 25,661,440
Vocabulary size 351 tokens
Max sequence length 256
d_model / heads / FFN / layers 512 / 8 / 2048 / 8
Fine-tune resumed from Phase 0 pop baseline
Best epoch 9

Training data

All 1,513 jazz training sequences plus 2,500 pop rehearsal sequences (seed 42). Pop:jazz β‰ˆ 1.65:1. The paper identifies this ratio as the minimum rehearsal volume that fully preserves pop fluency while delivering essentially the full jazz gain (paper Β§7.1).

Evaluation (held-out per-genre test sets)

Metric Pop test Jazz test
Top-1 accuracy 84.20% 80.99%
Top-5 accuracy 96.87% 92.63%
Perplexity 1.82 2.29
Ξ” vs. Phase 0 baseline βˆ’0.04 +8.13

Qualitative samples from F3 introduce secondary dominants, chromatic passing diminished chords, and other jazz voice-leading vocabulary that the Phase 0 baseline does not produce. See paper Β§6.4 for representative continuations.

Intended use

Recommended default for chord-composition workflows that need fluency in both pop and jazz registers. F1 (ft-pop80) and F4 (ft-pop29) are the stylistic endpoints when a more committed pop-leaning or jazz-leaning identity is desired.

Out of scope: melody or audio generation; genres outside pop, rock, and jazz; real-time low-latency settings.

Usage

import torch
from huggingface_hub import hf_hub_download
from model import MusicTransformer
from tokenizer import ChordTokenizer

ckpt_path = hf_hub_download(
    repo_id="PearlLeeStudio/TheArtist-MusicTransformer-ft-pop50",
    filename="best.pt",
)
tokenizer = ChordTokenizer()
ckpt = torch.load(ckpt_path, map_location="cpu", weights_only=False)

model = MusicTransformer(
    vocab_size=tokenizer.vocab_size,
    d_model=512, n_heads=8, d_ff=2048, n_layers=8,
    max_seq_len=256, dropout=0.0, pad_id=tokenizer.pad_id,
)
model.load_state_dict(ckpt["model_state_dict"])
model.eval()

# Prompt = jazz ii-V-I in C major
song = {
    "key": "Cmaj",
    "time_signature": "4/4",
    "genre": "jazz",
    "bars": [["Dm7", "G7"], ["Cmaj7"]],
}
prompt_ids = tokenizer.encode_sequence(song)[:-1]
ids = torch.tensor([prompt_ids])
with torch.no_grad():
    for _ in range(32):
        logits = model(ids)
        next_id = torch.multinomial(
            torch.softmax(logits[:, -1, :] / 0.8, dim=-1), 1
        )
        ids = torch.cat([ids, next_id], dim=-1)
        if next_id.item() == tokenizer.eos_id:
            break
print(tokenizer.decode(ids[0].tolist()))

Training-data licenses

Dataset License
Chordonomicon Public (user-generated)
McGill Billboard CC0
Jazz Harmony Treebank Public
JazzStandards (iReal Pro) Community redistribution
Weimar Jazz Database ODbL
JAAH Research-use public

Citation

Preprint: arXiv:2605.04998.

@misc{lee2026chordmix,
  title         = {Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation},
  author        = {Lee, Jinju},
  year          = {2026},
  eprint        = {2605.04998},
  archivePrefix = {arXiv}
}
Downloads last month
13
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for PearlLeeStudio/TheArtist-MusicTransformer-ft-pop50