whisper-yoad-small-he-acft

ACFT (Audio Context Fine-Tuning) applied to yoad/whisper-small for Hebrew speech recognition.

ACFT aligns partial-context encoder representations with full-context ones, improving short-utterance inference (e.g., keyboard dictation).

Evaluation

WER on ivrit-ai/whisper-training test split (2000 samples, no normalization):

Model WER
yoad/whisper-small (base) 0.2704
yoad/whisper-small + ACFT (this model) 0.2540

Training

  • Method: ACFT (encoder MSE alignment)
  • Dataset: google/fleurs he_il
  • Epochs: 8
  • Device: Apple MPS (M4 Pro)

Usage

from transformers import WhisperForConditionalGeneration, WhisperProcessor

model = WhisperForConditionalGeneration.from_pretrained("amitkot/whisper-yoad-small-he-acft")
processor = WhisperProcessor.from_pretrained("amitkot/whisper-yoad-small-he-acft")
Downloads last month
45
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for amitkot/whisper-yoad-small-he-acft

Finetuned
(1)
this model