Whisper Large V3 Turbo - Vietnamese Telephony
Fine-tuned openai/whisper-large-v3-turbo on Vietnamese telephony call center data.
Training Details
- Base model: openai/whisper-large-v3-turbo (809M params)
- Method: LoRA (r=32, alpha=64, 2.95% trainable params)
- Dataset: 954 segments, 5.11 hours Vietnamese telephony audio
- Audio: 8kHz telephony → 16kHz resampled, enhanced preprocessing
- Epochs: 10
- Learning rate: 1e-4 with cosine schedule
Results
| Metric | Value |
|---|---|
| Test WER | 27.92% |
| Test CER | 19.46% |
Usage
from transformers import WhisperForConditionalGeneration, WhisperProcessor, pipeline
processor = WhisperProcessor.from_pretrained("jason03/whisper-large-v3-turbo-vietnamese-telephony")
model = WhisperForConditionalGeneration.from_pretrained("jason03/whisper-large-v3-turbo-vietnamese-telephony")
pipe = pipeline("automatic-speech-recognition", model=model, tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor, device="cuda")
result = pipe("audio.wav", generate_kwargs={"language": "vi", "task": "transcribe"})
print(result["text"])
Or with faster-whisper (CTranslate2)
ct2-whisper-converter --model jason03/whisper-large-v3-turbo-vietnamese-telephony --output_dir ./model-ct2 --quantization float16
from faster_whisper import WhisperModel
model = WhisperModel("./model-ct2", device="cuda", compute_type="float16")
segments, info = model.transcribe("audio.wav", language="vi")
for seg in segments:
print(f"[{seg.start:.1f}s-{seg.end:.1f}s] {seg.text}")
- Downloads last month
- 46
Model tree for jasong03/whisper-large-v3-turbo-vietnamese-telephony
Base model
openai/whisper-large-v3 Finetuned
openai/whisper-large-v3-turbo