TheArtist Music Transformer — F1 (Pop 10K Mix, pop-leaning)

Jazz-adapted chord model with a 10,000-sequence pop rehearsal buffer. The pop-leaning endpoint of the mix-ratio sweep. Pop accuracy actually improves on the pre-fine-tune baseline; jazz reaches +8.17 points.

One of six checkpoints released alongside the paper Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation (Lee, 2026). See the paper for full context and the collection overview at PearlLeeStudio/TheArtist-MusicTransformer-pop-baseline.

Model summary

Field	Value
Architecture	Music Transformer with relative positional attention
Parameters	25,661,440
Vocabulary size	351 tokens
Max sequence length	256
d_model / heads / FFN / layers	512 / 8 / 2048 / 8
Fine-tune resumed from	Phase 0 pop baseline
Best epoch	6

Training data

All 1,513 jazz training sequences (Jazz Harmony Treebank, JazzStandards, Weimar Jazz Database, JAAH) plus 10,000 pop rehearsal sequences sub-sampled with seed 42 from the Phase 0 pop training split. Pop:jazz ≈ 6.6:1 in the mix.

Fine-tune hyperparameters: peak learning rate 2 × 10⁻⁵, two-epoch warmup, ten epochs maximum with patience 5.

Evaluation (held-out per-genre test sets)

Metric	Pop test	Jazz test
Top-1 accuracy	84.60%	81.03%
Top-5 accuracy	96.96%	92.41%
Perplexity	1.78	2.31
Δ vs. Phase 0 baseline	+0.36	+8.17

This is the only run in the sweep whose pop top-1 exceeds the Phase 0 baseline. It is also the run with the most stable pop curve over training. Choose F1 when pop fluency is a hard constraint and jazz coloration is welcome but not the primary target. Generations stay rooted in commercial pop and rock harmony, with jazz substitutions appearing selectively (an occasional secondary dominant or ii-V detour inside an otherwise diatonic loop).

Intended use and limitations

Recommended for chord-composition workflows targeting pop, rock, CCM, K-pop, J-pop, and modern country with optional jazz coloration. F4 (ft-pop29) is the symmetric jazz-leaning endpoint; F3 (ft-pop50) is the balanced middle.

Out of scope: melody or audio generation; genres outside pop, rock, and jazz; real-time low-latency settings.

Usage

import torch
from huggingface_hub import hf_hub_download
from model import MusicTransformer
from tokenizer import ChordTokenizer

ckpt_path = hf_hub_download(
    repo_id="PearlLeeStudio/TheArtist-MusicTransformer-ft-pop80",
    filename="best.pt",
)
tokenizer = ChordTokenizer()
ckpt = torch.load(ckpt_path, map_location="cpu", weights_only=False)

model = MusicTransformer(
    vocab_size=tokenizer.vocab_size,
    d_model=512, n_heads=8, d_ff=2048, n_layers=8,
    max_seq_len=256, dropout=0.0, pad_id=tokenizer.pad_id,
)
model.load_state_dict(ckpt["model_state_dict"])
model.eval()

# Prompt = ii-V-I in C major
song = {
    "key": "Cmaj",
    "time_signature": "4/4",
    "genre": "pop",
    "bars": [["Dm7", "G7"], ["Cmaj7"]],
}
prompt_ids = tokenizer.encode_sequence(song)[:-1]
ids = torch.tensor([prompt_ids])
with torch.no_grad():
    for _ in range(32):
        logits = model(ids)
        next_id = torch.multinomial(
            torch.softmax(logits[:, -1, :] / 0.8, dim=-1), 1
        )
        ids = torch.cat([ids, next_id], dim=-1)
        if next_id.item() == tokenizer.eos_id:
            break
print(tokenizer.decode(ids[0].tolist()))

Training-data licenses

Dataset	License
Chordonomicon	Public (user-generated)
McGill Billboard	CC0
Jazz Harmony Treebank	Public
JazzStandards (iReal Pro)	Community redistribution
Weimar Jazz Database	ODbL
JAAH	Research-use public

Citation

Preprint: arXiv:2605.04998.

@misc{lee2026chordmix,
  title         = {Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation},
  author        = {Lee, Jinju},
  year          = {2026},
  eprint        = {2605.04998},
  archivePrefix = {arXiv}
}

Downloads last month: 11

Paper for PearlLeeStudio/TheArtist-MusicTransformer-ft-pop80

Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation

Paper • 2605.04998 • Published 1 day ago