1 7 1

Xiaofei Wang

xiaofei-wang

https://www.microsoft.com/en-us/research/people/xiaofewa/

AI & ML interests

far-field/robust conversational speech recognition, speech separation and enhancement, audio generation

Recent Activity

authored a paper about 6 hours ago

TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech Synthesizers

authored a paper about 6 hours ago

Phi-Omni-ST: A multimodal language model for direct speech-to-speech translation

authored a paper about 6 hours ago

CoVoMix2: Advancing Zero-Shot Dialogue Generation with Fully Non-Autoregressive Flow Matching

View all activity

Organizations

None yet

authored 8 papers about 6 hours ago

TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech Synthesizers

Paper • 2406.15752 • Published Jun 22, 2024

Phi-Omni-ST: A multimodal language model for direct speech-to-speech translation

Paper • 2506.04392 • Published Jun 4, 2025

CoVoMix2: Advancing Zero-Shot Dialogue Generation with Fully Non-Autoregressive Flow Matching

Paper • 2506.00885 • Published Jun 1, 2025 • 1

EdiVal-Agent: An Object-Centric Framework for Automated, Scalable, Fine-Grained Evaluation of Multi-Turn Editing

Paper • 2509.13399 • Published Sep 16, 2025 • 5

SHANKS: Simultaneous Hearing and Thinking for Spoken Language Models

Paper • 2510.06917 • Published Oct 8, 2025 • 35

Accelerating Flow-Matching-Based Text-to-Speech via Empirically Pruned Step Sampling

Paper • 2505.19931 • Published May 26, 2025

FlexiCodec: A Dynamic Neural Audio Codec for Low Frame Rates

Paper • 2510.00981 • Published Oct 1, 2025 • 1

ProImage-Bench: Rubric-Based Evaluation for Professional Image Generation

Paper • 2512.12220 • Published Dec 13, 2025

authored a paper 9 months ago

STITCH: Simultaneous Thinking and Talking with Chunked Reasoning for Spoken Language Models

Paper • 2507.15375 • Published Jul 21, 2025 • 30

authored a paper almost 2 years ago

E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS

Paper • 2406.18009 • Published Jun 26, 2024 • 22

authored 3 papers about 2 years ago

NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription

Paper • 2401.08887 • Published Jan 16, 2024

ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided Sequence Reordering

Paper • 2401.07333 • Published Jan 14, 2024

Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like

Paper • 2402.07383 • Published Feb 12, 2024 • 15

authored a paper over 2 years ago

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer

Paper • 2308.06873 • Published Aug 14, 2023 • 27

Xiaofei Wang

AI & ML interests

Recent Activity

Organizations

xiaofei-wang's activity