LiteWhisper Large V3 Turbo ONNX

Word-level timestamps via cross-attention.

Models

Encoder Decoder Size
FP32 FP32 2.1 GB
FP16 quantized 1.2 GB (recommended)

Usage

const transcriber = await pipeline(
  "automatic-speech-recognition",
  "ipsilondev/lite-whisper-large-v3-turbo-onnx",
  { dtype: { encoder_model: "fp16", decoder_model_merged: "quantized" } }
);
const result = await transcriber(audio, { return_timestamps: "word" });
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support