Qwen3-TTS LoRA Adapter - Yuna

LoRA fine-tuned adapter for Qwen/Qwen3-TTS-12Hz-1.7B-Base.

Training Details

  • Base Model: Qwen3-TTS-12Hz-1.7B-Base
  • Method: LoRA (r=16, alpha=32)
  • Data: 1 hour Korean audiobook (Albert Camus - The Stranger)
  • Hardware: Apple M4 (MPS acceleration)
  • Trainable params: 19.2M / 1.94B (0.99%)

Usage

import torch
from peft import PeftModel
from qwen_tts import Qwen3TTSModel
from safetensors.torch import load_file

# Load base model
model = Qwen3TTSModel.from_pretrained(
    "Qwen/Qwen3-TTS-12Hz-1.7B-Base",
    dtype=torch.bfloat16,  # or float32 for MPS
    attn_implementation="flash_attention_2",  # or "eager" for MPS
    device_map="cuda",  # or "mps"
)

# Load LoRA adapter
model.model.talker = PeftModel.from_pretrained(
    model.model.talker,
    "tonymustbegreat/qwen3-tts-yuna-lora",
)

# Load speaker embedding
spk = load_file("tonymustbegreat/qwen3-tts-yuna-lora/speaker_embedding.safetensors")
# Inject into codec embedding at position 3000
model.model.talker.model.model.codec_embedding.weight.data[3000] = spk["speaker_embedding"]

# Generate speech
wavs, sr = model.generate_custom_voice(
    text="μ•ˆλ…•ν•˜μ„Έμš”, 이것은 νŒŒμΈνŠœλ‹λœ μŒμ„±μž…λ‹ˆλ‹€.",
    speaker="yuna",
)

Files

  • adapter_config.json - LoRA configuration
  • adapter_model.safetensors - LoRA weights (77MB)
  • speaker_embedding.safetensors - Speaker embedding vector
  • tts_config.json - Custom voice TTS config
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for tonymustbegreat/qwen3-tts-yuna-lora

Adapter
(2)
this model