faster-whisper-large-v3-turbo-int8-fp16

This is an int8_float16 quantized version of openai/whisper-large-v3-turbo converted to CTranslate2 format for use with faster-whisper.

Model Details

  • Base Model: openai/whisper-large-v3-turbo
  • Quantization: INT8_FLOAT16 (INT8 storage for linear layer weights (dequantized to FP16 for computation), and FP16 for activations and non-quantizable ops.)
  • Format: CTranslate2
  • Languages: 99 languages (see Whisper documentation)

Usage with faster-whisper

from faster_whisper import WhisperModel

model = WhisperModel("ThomasG/faster-whisper-large-v3-turbo-int8-fp16", device="cuda", compute_type="int8_float16")

segments, info = model.transcribe("audio.mp3")
text = " ".join(segment.text.strip() for segment in segments)

Conversion details

The OpenAI model was converted with the following command:

ct2-transformers-converter --model openai/whisper-large-v3-turbo --quantization int8_float16 --copy_files tokenizer.json preprocessor_config.json

License

This model inherits the MIT license from the original Whisper model.

Downloads last month
23
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support