faster-whisper-large-v3-turbo-int8-fp16

This is an int8_float16 quantized version of openai/whisper-large-v3-turbo converted to CTranslate2 format for use with faster-whisper.

Model Details

Base Model: openai/whisper-large-v3-turbo
Quantization: INT8_FLOAT16 (INT8 storage for linear layer weights (dequantized to FP16 for computation), and FP16 for activations and non-quantizable ops.)
Format: CTranslate2
Languages: 99 languages (see Whisper documentation)

Usage with faster-whisper

from faster_whisper import WhisperModel

model = WhisperModel("ThomasG/faster-whisper-large-v3-turbo-int8-fp16", device="cuda", compute_type="int8_float16")

segments, info = model.transcribe("audio.mp3")
text = " ".join(segment.text.strip() for segment in segments)

Conversion details

The OpenAI model was converted with the following command:

ct2-transformers-converter --model openai/whisper-large-v3-turbo --quantization int8_float16 --copy_files tokenizer.json preprocessor_config.json

License

This model inherits the MIT license from the original Whisper model.

Downloads last month: 23