You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

You agree to use the model according to its license.

Music Detection using SSL features

This model is trained to detect and segment music on an given audio file. It uses SSL features from the ina-foss/ssl-audio-1k-only_speech model.

You can find the full list of other music detection models using SSL features here. Their global results on Mirex, OpenBMAT and Seyerlehner are the following (see the paper for more details):

Link to model	Global F1 Score
ssl-music-detection-music2vec	91.2
ssl-music-detection-base	89.4
ssl-music-detection-no_music	87.1
ssl-music-detection-only_speech	87.5
ssl-music-detection-only_fr	87.7
ssl-music-detection-gender	88.3

Voice Activity Detection (VAD) models using SSL features can be found here.

Architecture

The model first extract features from the CNN and the first transformer layer of the ina-foss/ssl-audio-1k-only_speech SSL encoder. Then, these features are given to a downstream model MLP, which has been trained to binary predict music for each frame. During inference, the decoding uses a Viterbi decoder (from Librosa).

Data and training

It has been trained on training subsets of the following datasets :

The Open Broadcast Media Audio from TV dataset (OpenBMAT) by Meléndez-Catalán et al. (2019).
The Music/Speech dataset from the Mirex 2015 challenge.
The Music/Speech dataset from Seyerlehner et al., (2007).

For detailed information about training and results associated with this model, please refer to our publication. The training hyperparameters, original checkpoint and Tensorboard event files are available in the training directory.

Usage

To use this model, you need the packages listed inside the requirements.txt file. Then:

import librosa
from transformers import AutoModel

# loading the audio file, need to be sampled at 16kHz
audio, sr = librosa.load('/path/to/your/audio/file.wav', sr=16000)

# loading the music detection model
model = AutoModel.from_pretrained(
    'ina-foss/ssl-music-detection-only_speech',
    trust_remote_code=True
)

# running the inference
output = model(
    audio=audio,
    sampling_rate=sr
)

print(output)

[{'start': 0.0, 'stop': 56.58943157192866, 'label': False},
 {'start': 56.58943157192866, 'stop': 60.45007501250208, 'label': True},
 {'start': 60.45007501250208, 'stop': 62.870478413068845, 'label': False},
[...]
 {'start': 117.03950658443074, 'stop': 119.21986997832973, 'label': True},
 {'start': 119.21986997832973, 'stop': 119.97999666611102, 'label': False}]

License and citation

The model is distributed using the pantagruel-research-license.

If you use this model or find it useful in your research, publications, or applications, please cite the following work:

@inproceedings{pelloin2026lrec,
  author =       "Pelloin, Valentin and Bekkali, Lina and Dehak, Reda and Doukhan, David",
  year =         "2026",
  title =        "Data Selection Effects on Self-Supervised Learning of Audio Representations for French Audiovisual Broadcasts",
  booktitle={Fifteenth International Conference on Language Resources and Evaluation (LREC 2026)},
  address = "Palma, Mallorca, Spain",
  publisher = "European Language Resources Association",
}

Downloads last month: -

Model tree for ina-foss/ssl-music-detection-only_speech

Base model

ina-foss/ssl-audio-1k-only_speech

Finetuned

(2)

this model

Collection including ina-foss/ssl-music-detection-only_speech

SSL Music Detection

Collection

Music detection models using SSL features. • 6 items • Updated Feb 24