WhisperKit CoreML โ Distil Large v3 Italian
CoreML model for Italian speech-to-text on Apple Silicon, compatible with WhisperKit.
Why this model exists
The official distil-whisper/distil-large-v3 available on argmaxinc/whisperkit-coreml is English-only. Despite being flagged as multilingual (it inherits the tokenizer from large-v3), its decoder was distilled exclusively on English data. This means it ignores language tokens like <|it|> and always outputs English text, regardless of DecodingOptions.language settings.
This repo provides a CoreML conversion of bofenghuang/whisper-large-v3-distil-it-v0.2, a model distilled on 6,500+ hours of Italian audio, making it a true Italian-capable distilled Whisper model.
Model composition
| Component | Source | Notes |
|---|---|---|
| AudioEncoder | openai/whisper-large-v3 |
Identical to large-v3 (frozen during distillation) |
| MelSpectrogram | openai/whisper-large-v3 |
Standard mel-spectrogram preprocessing |
| TextDecoder | bofenghuang/whisper-large-v3-distil-it-v0.2 |
2 decoder layers, trained on Italian data |
| config.json | distil-whisper/distil-large-v3 |
Architecture config (2 decoder layers, 32 encoder layers) |
| generation_config.json | distil-whisper/distil-large-v3 |
Modified: language set to null (was `< |
How it was built
TextDecoder was converted from PyTorch to CoreML using whisperkittools:
whisperkit-generate-model \ --model-version bofenghuang/whisper-large-v3-distil-it-v0.2 \ --output-dir ./outputThe AudioEncoder conversion failed due to a
coremltoolscompatibility issue, but since the encoder is identical tolarge-v3(frozen during distillation), we reused the encoder fromargmaxinc/whisperkit-coreml.AudioEncoder + MelSpectrogram were copied from the official
openai_whisper-large-v3CoreML model on argmaxinc/whisperkit-coreml.generation_config.json was patched to set
"language": nullinstead of"language": "<|en|>"to avoid English bias.
Usage with WhisperKit
import WhisperKit
// Download the model
let modelURL = try await WhisperKit.download(
variant: "bofenghuang_whisper-large-v3-distil-it",
from: "jmadseeker/whisperkit-coreml-distil-large-v3-it"
)
// Initialize WhisperKit
let config = WhisperKitConfig(modelFolder: modelURL.path, verbose: false, logLevel: .error, load: true)
let whisperKit = try await WhisperKit(config)
// Transcribe in Italian
let options = DecodingOptions(
task: .transcribe,
language: "it",
temperature: 0.0,
temperatureIncrementOnFallback: 0.2,
temperatureFallbackCount: 2
)
let results = try await whisperKit.transcribe(audioPath: audioURL.path, decodeOptions: options)
print(results.first?.text ?? "")
Performance
- ~6x faster than
large-v3(2 decoder layers vs 32) - ~1.5 GB model size
- 98% ANE dispatch on Apple Silicon (Neural Engine accelerated)
- Quality comparable to
large-v3for Italian transcription
Used by
- Pulsecribe โ macOS voice dictation app with multi-provider transcription
Credits
- Original Italian distilled model: bofenghuang/whisper-large-v3-distil-it-v0.2
- WhisperKit framework: argmaxinc/WhisperKit
- CoreML conversion tools: argmaxinc/whisperkittools
- Official CoreML models: argmaxinc/whisperkit-coreml
- Downloads last month
- -
Model tree for jmadseeker/whisperkit-coreml-distil-large-v3-it
Base model
bofenghuang/whisper-large-v3-distil-it-v0.2