Add Model Card

c3881f2 verified 1 day ago

4.25 kB

license: apache-2.0
language:
  - zh
  - en
tags:
  - tts
  - speech-evaluation
  - continuation-score
  - role-play
  - reward-model
pipeline_tag: text-to-speech

MCLP-Score: Continuation Score Model for MCLP Metric

Yong Ren^*,1,2, Jingbei Li^*,1, Haiyang Sun¹, Yujie Chen³, Cheng Yi¹, Yechang Huang¹, Hao Gu², Ye Bai², Xuerui Yang¹

¹StepFun ²University of Chinese Academy of Sciences ³Beihang University

^*Equal contribution

📑 Paper | 💻 Code | 📊 Dataset | 🗣️ MCLP-RPTTS Model

Model Description

MCLP-Score is the Continuation Score model used to compute the MCLP (Mean Continuation Log-Probability) metric. Given a ground-truth audio prefix, this model evaluates how well a generated audio segment continues the stylistic pattern of the ground-truth, producing a log-probability score that measures expressive consistency.

The MCLP metric serves as both:

An evaluation metric for role-play TTS quality (correlation with human MOS: Spearman ρ = 0.94)
A reward signal for GRPO-based reinforcement learning to improve TTS expressiveness

This model is presented in:

Evaluating and Rewarding LALMs for Expressive Role-Play TTS via Mean Continuation Log-Probability Yong Ren*, Jingbei Li*, Haiyang Sun, Yujie Chen, Cheng Yi, Yechang Huang, Hao Gu, Ye Bai, Xuerui Yang ICML 2026

How MCLP Works

The MCLP metric computes the mean log-probability of audio tokens in the generated segment, conditioned on a ground-truth audio prefix:

MCLP = (1/N) * Σ log P(token_i | gt_prefix, token_1, ..., token_{i-1})

Higher MCLP scores indicate better stylistic consistency with the ground-truth speaking style.

Usage

# Clone the inference code
git clone https://github.com/y-ren16/MCLP.git
cd MCLP

# Compute MCLP scores
python compute_contination_score.py \
    --model-path /path/to/MCLP-Score \
    --audio-dir ./outputs/roleplay_tts \
    --gt-jsonl /path/to/WenetSpeech-RP/eval/eval_w_history.jsonl \
    --gt-dir /path/to/WenetSpeech-RP/eval/audio \
    --save-json mclp_results.json

Output:

MCLP (Mean avg_log_prob): -4.636xxx
Mean avg_prob: 0.xxxxx
Mean avg_rank: xx.xx

For detailed usage instructions, please refer to the code repository.

Requirements

Python >= 3.10
PyTorch >= 2.3 with CUDA
GPU: at least 1x A100/H100 (80GB) for inference

pip install transformers==4.49.0 torchaudio librosa onnxruntime s3tokenizer diffusers hyperpyyaml numpy

Related Resources

Resource	Link
📑 Paper	arXiv:2601.22661
💻 Inference Code	github.com/y-ren16/MCLP
📊 WenetSpeech-RP Dataset	huggingface.co/datasets/y-ren16/WenetSpeech-RP
🗣️ MCLP-RPTTS Model	huggingface.co/y-ren16/MCLP-RPTTS

Citation

@inproceedings{ren2026mclp,
  title={Evaluating and Rewarding LALMs for Expressive Role-Play TTS via Mean Continuation Log-Probability},
  author={Ren, Yong and Li, Jingbei and Sun, Haiyang and Chen, Yujie and Yi, Cheng and Huang, Yechang and Gu, Hao and Bai, Ye and Yang, Xuerui},
  booktitle={Proceedings of the 43rd International Conference on Machine Learning (ICML)},
  year={2026}
}

License

This model is released under the Apache 2.0 License.

Acknowledgements

This project builds upon: