HiFi-GAN Vocoder β Russian (RUSLAN corpus)
HiFi-GAN V1 vocoder trained on the RUSLAN Russian speech corpus for high-quality mel-to-audio conversion.
Training Details
- Architecture: HiFi-GAN V1 (14M parameters, 512 initial channels)
- Training code: jik876/hifi-gan
- Dataset: RUSLAN corpus β single male speaker, studio quality, 13,865 training files (~16 hours)
- Steps: 160,000 (~185 epochs)
- Hardware: NVIDIA RTX 4090 (24 GB)
- Batch size: 16
- Final mel-spec error: ~0.29
- Stopping reason: No further perceptible improvement in audio quality beyond 160k steps
Mel Spectrogram Parameters
These must match exactly when computing mel spectrograms for input:
| Parameter | Value |
|---|---|
| sample_rate | 22050 |
| n_fft | 1024 |
| hop_size | 256 |
| win_size | 1024 |
| num_mels | 80 |
| fmin | 0 |
| fmax | 8000 |
Important: Mel normalization must use HiFi-GAN's standard dynamic_range_compression:
# CORRECT β matches training format
mel = torch.log(torch.clamp(mel_linear, min=1e-5))
# range: approximately [-11.5, 0.9]
# WRONG β will produce artifacts
mel = torch.log(mel_linear + 1e-9)
# range: approximately [-20, 8] β vocoder was NOT trained on this
Usage
import torch
import torchaudio
import json
# Load model
with open('config.json') as f:
h = json.load(f)
# Use HiFi-GAN generator from jik876/hifi-gan repo
from models import Generator
generator = Generator(h).to('cuda')
ckpt = torch.load('generator.pth', map_location='cuda')
generator.load_state_dict(ckpt['generator'])
generator.eval()
# Compute mel (must use clamp, not epsilon addition)
waveform, sr = torchaudio.load('audio.wav')
mel_transform = torchaudio.transforms.MelSpectrogram(
sample_rate=22050, n_fft=1024, n_mels=80,
hop_length=256, win_length=1024, fmin=0, fmax=8000,
power=2.0, normalized=False,
)
mel_linear = mel_transform(waveform)
mel = torch.log(torch.clamp(mel_linear, min=1e-5)) # Standard HiFi-GAN normalization
# Generate audio
with torch.no_grad():
audio = generator(mel.to('cuda')).squeeze().cpu()
torchaudio.save('output.wav', audio.unsqueeze(0), 22050)
License
MIT (same as original HiFi-GAN)
- Downloads last month
- 24