TheArtist Music Transformer โ€” F1 (Pop 10K Mix, pop-leaning)

Jazz-adapted chord model with a 10,000-sequence pop rehearsal buffer. The pop-leaning endpoint of the mix-ratio sweep. Pop accuracy actually improves on the pre-fine-tune baseline; jazz reaches +8.17 points.

One of six checkpoints released alongside the paper Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation (Lee, 2026). See the paper for full context and the collection overview at PearlLeeStudio/TheArtist-MusicTransformer-pop-baseline.

Model summary

Field Value
Architecture Music Transformer with relative positional attention
Parameters 25,661,440
Vocabulary size 351 tokens
Max sequence length 256
d_model / heads / FFN / layers 512 / 8 / 2048 / 8
Fine-tune resumed from Phase 0 pop baseline
Best epoch 6

Training data

All 1,513 jazz training sequences (Jazz Harmony Treebank, JazzStandards, Weimar Jazz Database, JAAH) plus 10,000 pop rehearsal sequences sub-sampled with seed 42 from the Phase 0 pop training split. Pop:jazz โ‰ˆ 6.6:1 in the mix.

Fine-tune hyperparameters: peak learning rate 2 ร— 10โปโต, two-epoch warmup, ten epochs maximum with patience 5.

Evaluation (held-out per-genre test sets)

Metric Pop test Jazz test
Top-1 accuracy 84.60% 81.03%
Top-5 accuracy 96.96% 92.41%
Perplexity 1.78 2.31
ฮ” vs. Phase 0 baseline +0.36 +8.17

This is the only run in the sweep whose pop top-1 exceeds the Phase 0 baseline. It is also the run with the most stable pop curve over training. Choose F1 when pop fluency is a hard constraint and jazz coloration is welcome but not the primary target. Generations stay rooted in commercial pop and rock harmony, with jazz substitutions appearing selectively (an occasional secondary dominant or ii-V detour inside an otherwise diatonic loop).

Intended use and limitations

Recommended for chord-composition workflows targeting pop, rock, CCM, K-pop, J-pop, and modern country with optional jazz coloration. F4 (ft-pop29) is the symmetric jazz-leaning endpoint; F3 (ft-pop50) is the balanced middle.

Out of scope: melody or audio generation; genres outside pop, rock, and jazz; real-time low-latency settings.

Usage

import torch
from huggingface_hub import hf_hub_download
from model import MusicTransformer
from tokenizer import ChordTokenizer

ckpt_path = hf_hub_download(
    repo_id="PearlLeeStudio/TheArtist-MusicTransformer-ft-pop80",
    filename="best.pt",
)
tokenizer = ChordTokenizer()
ckpt = torch.load(ckpt_path, map_location="cpu", weights_only=False)

model = MusicTransformer(
    vocab_size=tokenizer.vocab_size,
    d_model=512, n_heads=8, d_ff=2048, n_layers=8,
    max_seq_len=256, dropout=0.0, pad_id=tokenizer.pad_id,
)
model.load_state_dict(ckpt["model_state_dict"])
model.eval()

# Prompt = ii-V-I in C major
song = {
    "key": "Cmaj",
    "time_signature": "4/4",
    "genre": "pop",
    "bars": [["Dm7", "G7"], ["Cmaj7"]],
}
prompt_ids = tokenizer.encode_sequence(song)[:-1]
ids = torch.tensor([prompt_ids])
with torch.no_grad():
    for _ in range(32):
        logits = model(ids)
        next_id = torch.multinomial(
            torch.softmax(logits[:, -1, :] / 0.8, dim=-1), 1
        )
        ids = torch.cat([ids, next_id], dim=-1)
        if next_id.item() == tokenizer.eos_id:
            break
print(tokenizer.decode(ids[0].tolist()))

Training-data licenses

Dataset License
Chordonomicon Public (user-generated)
McGill Billboard CC0
Jazz Harmony Treebank Public
JazzStandards (iReal Pro) Community redistribution
Weimar Jazz Database ODbL
JAAH Research-use public

Citation

Preprint: arXiv:2605.04998.

@misc{lee2026chordmix,
  title         = {Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation},
  author        = {Lee, Jinju},
  year          = {2026},
  eprint        = {2605.04998},
  archivePrefix = {arXiv}
}
Downloads last month
11
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Paper for PearlLeeStudio/TheArtist-MusicTransformer-ft-pop80