whisper-yoad-small-he-acft

ACFT (Audio Context Fine-Tuning) applied to yoad/whisper-small for Hebrew speech recognition.

ACFT aligns partial-context encoder representations with full-context ones, improving short-utterance inference (e.g., keyboard dictation).

Evaluation

WER on ivrit-ai/whisper-training test split (2000 samples, no normalization):

Model	WER
yoad/whisper-small (base)	0.2704
yoad/whisper-small + ACFT (this model)	0.2540

Training

Method: ACFT (encoder MSE alignment)
Dataset: google/fleurs he_il
Epochs: 8
Device: Apple MPS (M4 Pro)

Usage

from transformers import WhisperForConditionalGeneration, WhisperProcessor

model = WhisperForConditionalGeneration.from_pretrained("amitkot/whisper-yoad-small-he-acft")
processor = WhisperProcessor.from_pretrained("amitkot/whisper-yoad-small-he-acft")

Downloads last month: 45

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for amitkot/whisper-yoad-small-he-acft

Base model

yoad/whisper-small

Finetuned

(1)

this model