Breeze-ASR-25 CoreML INT8

INT8 weight-quantized CoreML version of MediaTek-Research/Breeze-ASR-25 for on-device speech recognition on Apple Silicon.

Model Files

Component Size Description
AudioEncoder.mlmodelc 609 MB Encoder (INT8 quantized)
TextDecoder.mlmodelc 867 MB Decoder (INT8 quantized)
Total 1.48 GB

Compression Details

  • Method: Linear symmetric INT8 weight quantization via coremltools.optimize.coreml.linear_quantize_weights
  • Storage precision: Mixed (Float16, Int8)
  • Original model size: ~2.9 GB (Float16)
  • Compression ratio: ~2x

Model Architecture

  • Base: Whisper-large-v2 fine-tuned for Taiwanese Mandarin + English code-switching
  • Encoder input: logmel_data (1 x 80 x 3000)
  • Encoder output: output (1 x 1500 x 1280)
  • Decoder input: token_data (1 x 1) + audio_data (1 x 1500 x 1280)
  • Decoder output: logits (1 x 1 x 51865)
  • Spec version: 7 (iOS 16+ / macOS 13+)

I/O Naming Convention

Uses whisper.cpp naming convention:

  • Encoder: logmel_data -> output
  • Decoder: token_data + audio_data -> logits

System Requirements

  • macOS 13+ / iOS 16+
  • Apple Silicon (M1/M2/M3/M4 or A-series)

Conversion

Converted using the sheep52031/breeze-asr-25-coreml-ane conversion toolchain with a custom HuggingFace decoder wrapper.

python convert-whisper-to-coreml_int8_support.py \
  --model MediaTek-Research/Breeze-ASR-25 \
  --hf-model \
  --quantize-int8

License

Apache 2.0 (same as the original Breeze-ASR-25 model).

Acknowledgments

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for weiren119/Breeze-ASR-25-coreml-int8

Finetuned
(15)
this model