glm-4-voice-decoder-emo-ft
Built with glm-4.
Fine-tuned GLM-4-Voice decoder weights for emotion-preserving Chinese ↔ English speech-to-speech translation, used together with the Kimi-Audio Emotion-Aware S2ST training / inference pipeline.
Files
| File | Size | Role |
|---|---|---|
epoch500_emoft.pt |
~425 MB | Fine-tuned flow checkpoint (emotion-preserving) |
hift.pt |
~79 MB | HiFT vocoder checkpoint |
Usage
git clone https://github.com/<YOUR_GH_USER>/kimi-audio-release
cd kimi-audio-release
./scripts/download_weights.sh
# the two files will be placed under glm_4_voice_decoder/
'EOF'
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support