JAZZMUS — OMR for Jazz Lead Sheets
An end-to-end Optical Music Recognition model that transcribes handwritten jazz lead sheet images (melody + chords) into Humdrum **kern / **mxhm notation.
Paper: Optical Music Recognition of Jazz Lead Sheets — ISMIR 2025
Overview
This model addresses the challenge of recognising handwritten jazz lead sheets, a score type that encodes melody and chord symbols. Chords are a score component not handled by existing OMR systems, and handwritten images exhibit high variability and quality issues.
The model is a Sheet Music Transformer (SMT) — a ConvNeXt encoder paired with a Transformer decoder — pretrained on polyphonic piano scores (antoniorv6/smt-grandstaff) and fine-tuned on the JAZZMUS dataset (handwritten + synthetic lead sheets) using medium-level tokenisation.
A YOLOv11 staff detector is bundled in this repository to enable full-page inference.
Quick Start
Command-line inference
Clone the repository and install dependencies:
git clone https://github.com/JuanCarlosMartinezSevilla/ISMIR-Jazzmus.git
cd ISMIR-Jazzmus
pip install -e ".[predict]"
Transcribe a full-page lead sheet image:
# Print to stdout
python predict.py page.jpg
# Save to a .krn file
python predict.py page.jpg -o output.krn
# Print and save
python predict.py page.jpg -o output.krn -p
# Process multiple pages
python predict.py page1.jpg page2.jpg -o results/
The script automatically downloads the SMT model and YOLO staff detector weights from this repository.
Programmatic usage
import torch
from safetensors.torch import load_file
from torch.nn import Conv1d
from huggingface_hub import hf_hub_download
from jazzmus.model.smt.configuration_smt import SMTConfig
from jazzmus.model.smt.modeling_smt import SMTModelForCausalLM
REPO_ID = "JuanCarlosMartinezSevilla/jazzmus-model"
# Load config and weights
config = SMTConfig.from_pretrained(REPO_ID)
weights_path = hf_hub_download(repo_id=REPO_ID, filename="model.safetensors")
sd = load_file(weights_path)
# Build model (handles pretrained embedding / output layer size mismatch)
embed_size = sd["decoder.embedding.weight"].shape[0]
out_size = sd["decoder.out_layer.weight"].shape[0]
config.out_categories = embed_size
model = SMTModelForCausalLM(config)
if embed_size != out_size:
model.decoder.out_layer = Conv1d(config.d_model, out_size, kernel_size=1)
model.load_state_dict(sd, strict=True)
config.out_categories = out_size
model.eval()
Datasets
This model was trained on:
| Dataset | Description |
|---|---|
| PRAIG/JAZZMUS | 293 handwritten jazz lead sheet page images with ground truth |
| PRAIG/JAZZMUS_staffLevel | 2021 region-level staff crops used for training |
| PRAIG/JAZZMUS_Synthetic | 326 synthetic lead sheet images (Classical + MuseJazz fonts) |
Limitations
- The model operates at staff-region level; full-page inference depends on the bundled YOLO staff detector.
- Chord recognition is harder than melody recognition — most errors involve chord symbols (see qualitative analysis in the paper).
- Implicit accidentals from key signatures are sometimes missed.
- Performance may degrade on score styles not represented in the training set (e.g., printed commercial lead sheets, non-jazz genres).
Citation
@inproceedings{juan_carlos_martinez_sevilla_2025_17811464,
author = {Juan Carlos Martinez-Sevilla and
Francesco Foscarin and
Patricia Garcia-Iasci and
David Rizo and
Jorge Calvo-Zaragoza and
Gerhard Widmer},
title = {Optical Music Recognition of Jazz Lead Sheets},
booktitle = {Proceedings of the 26th International Society for
Music Information Retrieval Conference},
year = 2025,
pages = {710-716},
publisher = {ISMIR},
month = sep,
venue = {Daejeon, South Korea and Online},
doi = {10.5281/zenodo.17811464},
url = {https://doi.org/10.5281/zenodo.17811464},
}
Links
- Paper: zenodo.17811464
- ArXiv: arXiv 2509.05329
- Code: GitHub — ISMIR-Jazzmus
- Project page: https://grfia.dlsi.ua.es/jazz-omr/
- Pretrained base model: antoniorv6/smt-grandstaff
- Downloads last month
- 188
Model tree for JuanCarlosMartinezSevilla/jazzmus-model
Base model
antoniorv6/smt-grandstaff