LiteWhisper Large V3 Turbo ONNX

Word-level timestamps via cross-attention.

Models

Encoder	Decoder	Size
FP32	FP32	2.1 GB
FP16	quantized	1.2 GB (recommended)

Usage

const transcriber = await pipeline(
  "automatic-speech-recognition",
  "ipsilondev/lite-whisper-large-v3-turbo-onnx",
  { dtype: { encoder_model: "fp16", decoder_model_merged: "quantized" } }
);
const result = await transcriber(audio, { return_timestamps: "word" });

Downloads last month: 2