f5-serbian / README.md

dedadev

Update README.md

7cbe3b8 verified 19 days ago

preview code

raw

history blame contribute delete

1.54 kB

metadata

language:
  - sr
tags:
  - text-to-speech
  - tts
  - f5-tts
  - serbian
license: mit
base_model:
  - SWivid/F5-TTS
pipeline_tag: text-to-speech

F5-TTS Serbian

A Serbian TTS model based on F5-TTS, trained from scratch on a Serbian speech dataset. This model is not production ready, still halucinates. Its just a test.

Model Details

Property	Value
Architecture	F5TTS_v1_Base
Tokenizer	char
Training	from scratch (not finetuned)
Mixed precision	bf16
Dataset	60,948 samples / 132.05 hours
Steps	430,000
Epochs	434
GPU	NVIDIA A40 (46GB)

Training Config

exp_name: F5TTS_v1_Base
tokenizer: char
mixed_precision: bf16
learning_rate: 7.5e-05
batch_size_per_gpu: 20189
batch_size_type: frame
max_samples: 64
grad_accumulation_steps: 1
max_grad_norm: 1
epochs: 434
num_warmup_updates: 3779
save_per_updates: 5000
keep_last_n_checkpoints: 1
last_per_updates: 10000
logger: tensorboard

Training Curves

Loss

Learning Rate

Checkpoint

The checkpoint contains only the EMA model weights (ema_model_state_dict), stripped of optimizer and scheduler states for minimal file size.

Usage

Load with F5-TTS:

import torch
from f5_tts.model import DiT
from f5_tts.infer.utils_infer import load_checkpoint

ckpt = torch.load("model_430000.pt", map_location="cpu")
model_state = ckpt["ema_model_state_dict"]