metadata
language: it
license: mit
tags:
- whisper
- automatic-speech-recognition
- italian
- localai
datasets:
- mozilla-foundation/common_voice_25_0
base_model: openai/whisper-medium
pipeline_tag: automatic-speech-recognition
whisper-medium-it
Fine-tuned openai/whisper-medium (769M params) for Italian ASR.
Author: Ettore Di Giacinto
Brought to you by the LocalAI team. This model can be used directly with LocalAI.
Usage with LocalAI
This model is ready to use with LocalAI via the whisperx backend.
Save the following as whisperx-medium-it.yaml in your LocalAI models directory:
name: whisperx-medium-it
backend: whisperx
known_usecases:
- transcript
parameters:
model: LocalAI-io/whisper-medium-it-ct2-int8
language: it
Then transcribe audio via the OpenAI-compatible endpoint:
curl http://localhost:8080/v1/audio/transcriptions \
-H "Content-Type: multipart/form-data" \
-F file="@audio.mp3" \
-F model="whisperx-medium-it"
Results
Evaluated on Common Voice 25.0 Italian test set (15,184 samples):
| Step | WER |
|---|---|
| 1000 | 14.61% |
| 3000 | 13.77% |
| 5000 | 12.42% |
| 7000 | 11.58% |
| 9000 | 10.66% |
| 10000 | 10.47% |
Training Details
- Base model: openai/whisper-medium (769M parameters)
- Dataset: Common Voice 25.0 Italian (173k train, 15k dev, 15k test)
- Steps: 10,000
- Precision: bf16 on NVIDIA GB10
Usage
Transformers
from transformers import pipeline
pipe = pipeline("automatic-speech-recognition", model="LocalAI-io/whisper-medium-it")
result = pipe("audio.mp3", generate_kwargs={"language": "it", "task": "transcribe"})
print(result["text"])
CTranslate2 / faster-whisper
For optimized CPU inference: LocalAI-io/whisper-medium-it-ct2-int8
Links
- Multi-dataset version: LocalAI-io/whisper-medium-it-multi (WER 12.4%)
- CTranslate2 INT8: LocalAI-io/whisper-medium-it-ct2-int8
- Code: github.com/localai-org/italian-whisper
- LocalAI: github.com/mudler/LocalAI