faster-whisper-large-v3-turbo-int8-fp16
This is an int8_float16 quantized version of openai/whisper-large-v3-turbo converted to CTranslate2 format for use with faster-whisper.
Model Details
- Base Model: openai/whisper-large-v3-turbo
- Quantization: INT8_FLOAT16 (INT8 storage for linear layer weights (dequantized to FP16 for computation), and FP16 for activations and non-quantizable ops.)
- Format: CTranslate2
- Languages: 99 languages (see Whisper documentation)
Usage with faster-whisper
from faster_whisper import WhisperModel
model = WhisperModel("ThomasG/faster-whisper-large-v3-turbo-int8-fp16", device="cuda", compute_type="int8_float16")
segments, info = model.transcribe("audio.mp3")
text = " ".join(segment.text.strip() for segment in segments)
Conversion details
The OpenAI model was converted with the following command:
ct2-transformers-converter --model openai/whisper-large-v3-turbo --quantization int8_float16 --copy_files tokenizer.json preprocessor_config.json
License
This model inherits the MIT license from the original Whisper model.
- Downloads last month
- 23