๐ฎ๐น Italian ASR Fine-Tuning: Whisper v3 Turbo + LoRA
This model is a fine-tuned version of OpenAI's Whisper Large v3 Turbo on the Italian FLEURS dataset. It achieves a WER (Word Error Rate) of 3.95%, outperforming the official baseline (5.14%).
๐ Results
| Metric | Value |
|---|---|
| Model | Whisper Large v3 Turbo |
| Dataset | FLEURS (Italian) |
| WER | 3.95% |
| VRAM Usage | ~4GB (Inference) |
๐ Usage
import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
from peft import PeftModel
base_model = AutoModelForSpeechSeq2Seq.from_pretrained(
"openai/whisper-large-v3-turbo",
load_in_4bit=True,
device_map="auto"
)
# Load these adapters
model = PeftModel.from_pretrained(base_model, "Corviinuss/whisper-large-v3-turbo-italian-lora")
processor = AutoProcessor.from_pretrained("Corviinuss/whisper-large-v3-turbo-italian-lora")
pipe = pipeline(
"automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
chunk_length_s=30,
)
# Transcribe
result = pipe("audio.mp3", generate_kwargs={"language": "italian"})
print(result["text"])