TTS tokenizer for Qwen3-Omni

#3
by anferico - opened

Hi, is this TTS tokenizer the same as the one used for training Qwen3-Omni? Can I use this TTS tokenizer to generate ground-truth audio codes for fine-tuning the code predictor module of Qwen3-Omni?

I believe this difference lies in the implementation of the Decoder class. They look very similar, but since the part where it converts from codes to hidden states differs, It seems the model weights are not compatible.

https://github.com/huggingface/transformers/blob/main/src/transformers/models/qwen3_omni_moe/modular_qwen3_omni_moe.py#L2312
https://github.com/QwenLM/Qwen3-TTS/blob/main/qwen_tts/core/tokenizer_12hz/modeling_qwen3_tts_tokenizer_v2.py#L824

Sign up or log in to comment