sherpa-onnx-whisper-distil-large-v3-it
Italian-distilled Whisper large-v3 model exported to ONNX format for sherpa-onnx.
Description
This is an int8-quantized ONNX export of bofenghuang/whisper-large-v3-distil-it-v0.2, an Italian-specific distillation of OpenAI's Whisper large-v3 model.
The distillation reduces the decoder from 32 layers to 2 layers, making it significantly smaller and faster while maintaining strong Italian transcription quality.
Model details
| Property | Value |
|---|---|
| Base model | bofenghuang/whisper-large-v3-distil-it-v0.2 |
| Language | Italian |
| Decoder layers | 2 (distilled from 32) |
| n_mels | 128 |
| n_audio_ctx | 1500 |
| n_text_ctx | 448 |
| n_text_state | 1280 |
| Vocab size | 51866 |
| Quantization | int8 (QInt8 on MatMul) |
| License | Apache 2.0 |
Files
| File | Size |
|---|---|
distil-large-v3-it-encoder.int8.onnx |
~637 MB |
distil-large-v3-it-decoder.int8.onnx |
~300 MB |
distil-large-v3-it-tokens.txt |
~798 KB |
Usage with sherpa-onnx
This model is designed for use with the sherpa-onnx runtime. Download all three files to the same directory and configure your sherpa-onnx pipeline to use:
- Encoder:
distil-large-v3-it-encoder.int8.onnx - Decoder:
distil-large-v3-it-decoder.int8.onnx - Tokens:
distil-large-v3-it-tokens.txt
Export
Exported from the original model using a script based on k2-fsa/sherpa-onnx/scripts/whisper/export-onnx.py.
Acknowledgments
- Original Whisper model by OpenAI
- Italian distillation by bofenghuang
- ONNX runtime by sherpa-onnx
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support