Whisper Large v3 Turbo — Hungarian LoRA (CTranslate2)

A LoRA fine-tuned Whisper large-v3-turbo model specifically optimized for the Hungarian language. This model corrects common recognition inaccuracies and hallucinations present in the official base model (e.g., special Hungarian characters, native names, and complex grammar).

This repository contains the CTranslate2 accelerated (quantized) version, which is perfectly suited for use with the popular faster-whisper Python library for real-time (streaming) or batch transcription with exceptionally low VRAM requirements.


Highlights

  • Optimized for Hungarian: Fixes hallucinations from the OpenAI base model (like hallucinating words during silence).
  • Extreme Inference Speed: Powered by the CTranslate2 engine with 8-bit quantization, it transcribes minutes of audio in seconds.
  • Low VRAM: Low memory footprint (~1.6 GB), comfortably running on 4GB-8GB VRAM GPUs alongside models like F5-TTS.
  • Perfect Match for F5-TTS: Highly recommended for generating automatic reference text (ref_text) for F5-TTS voice cloning, which is highly sensitive to transcription inaccuracies.

Quick Start (Python)

To use the model natively, you need the faster-whisper package:

pip install faster-whisper

Usage:

from faster_whisper import WhisperModel

# The model downloads automatically from Hugging Face
model_id = "Maxdorger29/whisper-large-v3-turbo-hungarian-lora"

# Enable float16 computation on GPU
model = WhisperModel(model_id, device="cuda", compute_type="float16")

segments, info = model.transcribe("magyar_hang_minta.wav", language="hu")

print(f"Detected language: {info.language} ({info.language_probability:.2f})")
for segment in segments:
    print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")

Repository Overview

File Size Description
model.bin ~1.6 GB CTranslate2 compatible FP16 model weights
config.json <1 KB CTranslate2 configuration
vocabulary.json ~3 MB Tokens and vocabulary

Links

Companion Model (F5-TTS)

We originally fine-tuned this model as the backend transcribing engine for the Maxdorger29/f5-tts-hungarian project, to provide millimeter-accurate reference texts (ref_text). For the highest quality voice cloning with F5-TTS, this model ensures your transcriptions are flawless.

Support the Project

Training and fine-tuning open-source LLM/TTS/STT models requires significant GPU resources. If this Hungarian-optimized Whisper LoRA speeds up your workflow or projects, consider buying me a coffee to directly support the compute costs of future epochs! ☕

Ko-fi

License

This model and its fine-tuning weights are open for use (MIT License for the optimization scripts and weights), provided usage does not violate the original terms of the OpenAI Whisper large-v3-turbo model.

For questions, pull requests, or better datasets, feel free to reach out on GitHub!


Fine-tuned for the Hungarian Open-Source AI Community.

Downloads last month
44
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Maxdorger29/whisper-large-v3-turbo-hungarian-lora

Adapter
(118)
this model