WavLM Transformer MOS Predictor (English)

This repository provides a pre-trained model for Mean Opinion Score (MOS) Generation. Given a directory of English audio files, this model predicts the perceived quality on a scale of 1 (Poor) to 5 (Excellent).

The architecture uses a WavLM (Base) feature extractor followed by a custom Transformer-based regression head.

πŸ’» System Requirements

  • Python: 3.12 (Recommended)
  • RAM: 8GB Minimum
  • GPU: Optional but recommended (NVIDIA RTX series with 8GB+ VRAM preferred)
  • Disk Space: ~1.5 GB for model weights

πŸ› οΈ Installation

First, clone the repository and install the necessary dependencies to set up your local environment:

# Clone the repository
git clone [https://huggingface.co/mustafa-ozan-duman/wavlm-transformer-mos-english](https://huggingface.co/mustafa-ozan-duman/wavlm-transformer-mos-english)
cd wavlm-transformer-mos-english

# Install requirements
pip install -r requirements.txt

πŸš€ How to Generate MOS Labels
We have provided a dedicated script, predict_folder.py, to handle batch generation. This script will scan a directory for .wav files, convert them to the required 16kHz format automatically, and save the predictions to a CSV file.


Basic Usage:
Run the following command in your terminal:

Bash
python predict_folder.py --dir path/to/your/audio_folder --out my_predictions.csv

Script Arguments:
--dir: (Required) Path to the directory containing your audio files.

--out: (Optional) Name of the output CSV file (Default: mos_predictions.csv).




πŸ“‚ Repository Structure
best_model.pth: The trained model weights (1.33 GB).

model.py: The WavLM + Transformer architecture definition.

dataset.py: Audio preprocessing and loading logic.

predict_folder.py: The main script for generating MOS labels.

requirements.txt: Exact library versions used during development.

πŸ“ Technical Details
Sampling Rate: The model operates at 16,000 Hz. Audios with different sampling rates are automatically resampled by the prediction script.

Input Format: Mono PCM Wav.

Output: A CSV file with two columns: wav_filename and predicted_mos.

Author: Mustafa Ozan Duman

Affiliation: Research Assistant, Bursa Uludag University

Contact: mustafaduman@uludag.edu.tr
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support