HiFi-GAN Vocoder — Russian (RUSLAN corpus)

HiFi-GAN V1 vocoder trained on the RUSLAN Russian speech corpus for high-quality mel-to-audio conversion.

Training Details

Architecture: HiFi-GAN V1 (14M parameters, 512 initial channels)
Training code: jik876/hifi-gan
Dataset: RUSLAN corpus — single male speaker, studio quality, 13,865 training files (~16 hours)
Steps: 160,000 (~185 epochs)
Hardware: NVIDIA RTX 4090 (24 GB)
Batch size: 16
Final mel-spec error: ~0.29
Stopping reason: No further perceptible improvement in audio quality beyond 160k steps

Mel Spectrogram Parameters

These must match exactly when computing mel spectrograms for input:

Parameter	Value
sample_rate	22050
n_fft	1024
hop_size	256
win_size	1024
num_mels	80
fmin	0
fmax	8000

Important: Mel normalization must use HiFi-GAN's standard dynamic_range_compression:

# CORRECT — matches training format
mel = torch.log(torch.clamp(mel_linear, min=1e-5))
# range: approximately [-11.5, 0.9]

# WRONG — will produce artifacts
mel = torch.log(mel_linear + 1e-9)
# range: approximately [-20, 8] — vocoder was NOT trained on this

Usage

import torch
import torchaudio
import json

# Load model
with open('config.json') as f:
    h = json.load(f)

# Use HiFi-GAN generator from jik876/hifi-gan repo
from models import Generator
generator = Generator(h).to('cuda')
ckpt = torch.load('generator.pth', map_location='cuda')
generator.load_state_dict(ckpt['generator'])
generator.eval()

# Compute mel (must use clamp, not epsilon addition)
waveform, sr = torchaudio.load('audio.wav')
mel_transform = torchaudio.transforms.MelSpectrogram(
    sample_rate=22050, n_fft=1024, n_mels=80,
    hop_length=256, win_length=1024, fmin=0, fmax=8000,
    power=2.0, normalized=False,
)
mel_linear = mel_transform(waveform)
mel = torch.log(torch.clamp(mel_linear, min=1e-5))  # Standard HiFi-GAN normalization

# Generate audio
with torch.no_grad():
    audio = generator(mel.to('cuda')).squeeze().cpu()
torchaudio.save('output.wav', audio.unsqueeze(0), 22050)

License

MIT (same as original HiFi-GAN)

Downloads last month: 24