ternary-models: VLMs, Multimodal & Audio
Collection
Ternary-quantized models for architectures GGUF can't handle. tritplane3 scheme. โข 16 items โข Updated โข 1
Ternary-quantized version of openai/whisper-large-v3 using ternary-quant.
No GGUF or GPTQ alternative exists for Whisper. ternary-quant is one of the few tools that can quantize audio/speech models.
| Property | Value |
|---|---|
| Base Model | openai/whisper-large-v3 |
| Parameters | 1.5B |
| Architecture | Encoder-decoder (speech-to-text) |
| Quantization | tritplane3 (decoder only, 320 layers) |
| Audio Encoder | FP16 (preserved for transcription quality) |
| Languages | 100+ |
| License | Apache 2.0 |
| Method | Size | Compression |
|---|---|---|
| FP16 (original) | ~3 GB | 1x |
| Ternary tritplane3 | 944 MB | 1.8x |
Tested with audio input โ ternary output exactly matches FP16 original:
| Input | FP16 | Ternary |
|---|---|---|
| Speech audio | "Oh" | "Oh" |
Both models produce identical transcriptions.
pip install ternary-quant
from ternary_quant.inference import load_ternary_model
import torch, numpy as np
model, processor = load_ternary_model(
"AsadIsmail/whisper-large-v3-ternary",
runtime_mode="cached", device="cpu"
)
model = model.float() # Required: cast to float32 for encoder compatibility
# Transcribe audio
import soundfile as sf
audio, sr = sf.read("audio.flac")
inputs = processor(audio.astype(np.float32), sampling_rate=sr, return_tensors="pt")
inputs = {k: v.float() for k, v in inputs.items()}
with torch.no_grad():
predicted_ids = model.generate(**inputs, max_new_tokens=100)
print(processor.batch_decode(predicted_ids, skip_special_tokens=True)[0])
Note: Requires .float() cast due to encoder conv1d dtype mismatch. This is documented and will be fixed in a future ternary-quant release.
Part of ternary-models.
GitHub: github.com/Asad-Ismail/ternary-models | Library: github.com/Asad-Ismail/ternary-quant
Base model
openai/whisper-large-v3