whisper-yoad-small-he-acft
ACFT (Audio Context Fine-Tuning) applied to yoad/whisper-small for Hebrew speech recognition.
ACFT aligns partial-context encoder representations with full-context ones, improving short-utterance inference (e.g., keyboard dictation).
Evaluation
WER on ivrit-ai/whisper-training test split (2000 samples, no normalization):
| Model | WER |
|---|---|
| yoad/whisper-small (base) | 0.2704 |
| yoad/whisper-small + ACFT (this model) | 0.2540 |
Training
- Method: ACFT (encoder MSE alignment)
- Dataset: google/fleurs he_il
- Epochs: 8
- Device: Apple MPS (M4 Pro)
Usage
from transformers import WhisperForConditionalGeneration, WhisperProcessor
model = WhisperForConditionalGeneration.from_pretrained("amitkot/whisper-yoad-small-he-acft")
processor = WhisperProcessor.from_pretrained("amitkot/whisper-yoad-small-he-acft")
- Downloads last month
- 45
Model tree for amitkot/whisper-yoad-small-he-acft
Base model
yoad/whisper-small