Whisper Large v3 — Ternary Quantized (tritplane3)

Ternary-quantized version of openai/whisper-large-v3 using ternary-quant.

No GGUF or GPTQ alternative exists for Whisper. ternary-quant is one of the few tools that can quantize audio/speech models.

Model Specifications

Property	Value
Base Model	openai/whisper-large-v3
Parameters	1.5B
Architecture	Encoder-decoder (speech-to-text)
Quantization	tritplane3 (decoder only, 320 layers)
Audio Encoder	FP16 (preserved for transcription quality)
Languages	100+
License	Apache 2.0

Size Comparison

Method	Size	Compression
FP16 (original)	~3 GB	1x
Ternary tritplane3	944 MB	1.8x

Quality Verification

Tested with audio input — ternary output exactly matches FP16 original:

Input	FP16	Ternary
Speech audio	"Oh"	"Oh"

Both models produce identical transcriptions.

Quickstart

pip install ternary-quant

from ternary_quant.inference import load_ternary_model
import torch, numpy as np

model, processor = load_ternary_model(
    "AsadIsmail/whisper-large-v3-ternary",
    runtime_mode="cached", device="cpu"
)
model = model.float()  # Required: cast to float32 for encoder compatibility

# Transcribe audio
import soundfile as sf
audio, sr = sf.read("audio.flac")
inputs = processor(audio.astype(np.float32), sampling_rate=sr, return_tensors="pt")
inputs = {k: v.float() for k, v in inputs.items()}

with torch.no_grad():
    predicted_ids = model.generate(**inputs, max_new_tokens=100)
print(processor.batch_decode(predicted_ids, skip_special_tokens=True)[0])

Note: Requires .float() cast due to encoder conv1d dtype mismatch. This is documented and will be fixed in a future ternary-quant release.

Collection

Part of ternary-models.

GitHub: github.com/Asad-Ismail/ternary-models | Library: github.com/Asad-Ismail/ternary-quant

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for AsadIsmail/whisper-large-v3-ternary

Base model

openai/whisper-large-v3

Finetuned

(813)

this model

Collection including AsadIsmail/whisper-large-v3-ternary

ternary-models: VLMs, Multimodal & Audio

Collection

Ternary-quantized models for architectures GGUF can't handle. tritplane3 scheme. • 16 items • Updated 1 day ago • 1