Audio-Text-to-Text
Transformers
Safetensors
Chinese
English
qwen2_5_omni_thinker
speech
audio
speech-evaluation
expressive-speech
mandarin
chain-of-thought
ceaeval
Instructions to use TianRW/CEAEval-Model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use TianRW/CEAEval-Model with Transformers:
# Load model directly from transformers import AutoTokenizer, GatedAttenQwen2_5omnithinker tokenizer = AutoTokenizer.from_pretrained("TianRW/CEAEval-Model") model = GatedAttenQwen2_5omnithinker.from_pretrained("TianRW/CEAEval-Model") - Notebooks
- Google Colab
- Kaggle
| license: cc-by-nc-4.0 | |
| language: | |
| - zh | |
| - en | |
| library_name: transformers | |
| tags: | |
| - speech | |
| - audio | |
| - speech-evaluation | |
| - expressive-speech | |
| - mandarin | |
| - chain-of-thought | |
| - ceaeval | |
| pipeline_tag: audio-text-to-text | |
| # CEAEval-Model (CEAEval-M) | |
| **CEAEval-M** is the speech-LLM *judge* released together with our ACL paper | |
| *"Evaluating the Expressive Appropriateness of Speech in Rich Contexts"*. | |
| Given a Mandarin speech segment together with an *ideal expressive plan* | |
| inferred from its surrounding narrative context, CEAEval-M produces | |
| ``` | |
| <think> step-by-step comparison of ideal vs. actual expression, | |
| with <focus_audio>…</focus_audio> spans pointing to | |
| audio-grounded cues (emotion / rhythm / intonation / | |
| recording condition / paralinguistic events) </think> | |
| <score>X.X</score> # overall expressive appropriateness ∈ [0.0, 5.0] | |
| ``` | |
| This is the *judge* half of the planner–judge decoupled pipeline defined | |
| in the paper. It is designed to work with a frozen text-only planner | |
| ([Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B)) that first summarizes | |
| long narrative context into a four-tuple | |
| `{emotion, rhythm, intonation, recording_condition}` via multi-context | |
| voting. | |
| ## What's released here | |
| - Model weights in `safetensors` (4 shards) plus `config.json`, | |
| `generation_config.json`, tokenizer, preprocessor, and chat template. | |
| - **Six extra special tokens** the judge uses during training and | |
| inference (`<think>`, `</think>`, `<score>`, `</score>`, | |
| `<focus_audio>`, `</focus_audio>`) — already merged into the tokenizer | |
| and embedding matrix. | |
| - A patched `modeling_*` path that implements the **adaptive audio | |
| attention bias** mechanism described in Sec. 3.3.4 and Appendix F of | |
| the paper (region-wise bias over system-prompt / audio / CoT regions). | |
| - `test_datas/` with **anonymised** sanity samples (audio + JSON) so | |
| you can verify the pipeline end-to-end without touching the main | |
| dataset. | |
| The full inference pipeline (planner + judge, audio pre-processing, | |
| batch driver, sanity examples) lives in the code repository — see | |
| [Related resources](#related-resources). | |
| ## Intended use and limitations | |
| - Intended as a **research benchmark and diagnostic tool** for | |
| expressive-speech generation / selection, not as a standalone | |
| decision-making system. Expressive appropriateness is inherently | |
| subjective; predictions should be interpreted with appropriate human | |
| oversight. | |
| - Trained and evaluated on **Mandarin audiobook speech**. Applying the | |
| model to other languages, styles, or domains (short commands, | |
| non-narrative dialogue, etc.) may produce unreliable scores. | |
| ## Related resources | |
| This model is one of three companion releases for the paper. **Please | |
| use them together:** | |
| | Resource | Link | | |
| | --- | --- | | |
| | 📄 Paper | *Evaluating the Expressive Appropriateness of Speech in Rich Contexts* (ACL) | | |
| | 💻 Code | <https://github.com/wangtianrui/CEAEval> | | |
| | 🤖 Model (this repo) | <https://huggingface.co/TianRW/CEAEval-Model> | | |
| | 📚 Dataset (CEAEval-D) | <https://huggingface.co/datasets/TianRW/CEAEval-Data> | | |
| | 🌐 Project page / demo | <https://wangtianrui.github.io/ceaeval/> | | |
| ## License | |
| Released under **CC BY-NC 4.0** — non-commercial academic research use | |
| only. The released weights do not contain or expose raw audio, | |
| transcripts, or any personally identifiable information. | |