WavLM Transformer MOS Predictor (English)
This repository provides a pre-trained model for Mean Opinion Score (MOS) Generation. Given a directory of English audio files, this model predicts the perceived quality on a scale of 1 (Poor) to 5 (Excellent).
The architecture uses a WavLM (Base) feature extractor followed by a custom Transformer-based regression head.
π» System Requirements
- Python: 3.12 (Recommended)
- RAM: 8GB Minimum
- GPU: Optional but recommended (NVIDIA RTX series with 8GB+ VRAM preferred)
- Disk Space: ~1.5 GB for model weights
π οΈ Installation
First, clone the repository and install the necessary dependencies to set up your local environment:
# Clone the repository
git clone [https://huggingface.co/mustafa-ozan-duman/wavlm-transformer-mos-english](https://huggingface.co/mustafa-ozan-duman/wavlm-transformer-mos-english)
cd wavlm-transformer-mos-english
# Install requirements
pip install -r requirements.txt
π How to Generate MOS Labels
We have provided a dedicated script, predict_folder.py, to handle batch generation. This script will scan a directory for .wav files, convert them to the required 16kHz format automatically, and save the predictions to a CSV file.
Basic Usage:
Run the following command in your terminal:
Bash
python predict_folder.py --dir path/to/your/audio_folder --out my_predictions.csv
Script Arguments:
--dir: (Required) Path to the directory containing your audio files.
--out: (Optional) Name of the output CSV file (Default: mos_predictions.csv).
π Repository Structure
best_model.pth: The trained model weights (1.33 GB).
model.py: The WavLM + Transformer architecture definition.
dataset.py: Audio preprocessing and loading logic.
predict_folder.py: The main script for generating MOS labels.
requirements.txt: Exact library versions used during development.
π Technical Details
Sampling Rate: The model operates at 16,000 Hz. Audios with different sampling rates are automatically resampled by the prediction script.
Input Format: Mono PCM Wav.
Output: A CSV file with two columns: wav_filename and predicted_mos.
Author: Mustafa Ozan Duman
Affiliation: Research Assistant, Bursa Uludag University
Contact: mustafaduman@uludag.edu.tr
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support