Whisper-medium Polish Medical ASR (anti-forgetting)
A LoRA adapter for openai/whisper-medium fine-tuned on Polish medical speech
using an anti-forgetting training recipe (knowledge distillation + medical data oversampling + general-domain replay) that
specialises the model for medical Polish while preserving performance on general Polish speech.
This model reduces combined WER on held-out Polish medical test sets from 21.16 % β 13.06 % (β38 % relative) while improving performance on general Polish speech (bigos: 20.62 % β 10.75 %, β48 %). Naive medical fine-tuning typically destroys general-domain performance; this recipe avoids that trade-off.
Benchmark vs base whisper-medium (per dataset)
Held-out test sets (3,205 samples, fair-eval methodology β no train/test text overlap):
| Test Set | Base whisper-medium | This model | Ξ (pp) | Relative |
|---|---|---|---|---|
| admed_anoni (medical, synthetic) | 32.48 % | 19.13 % | β13.35 | β41 % |
| admed_human (medical, human read) | 26.96 % | 13.47 % | β13.49 | β50 % |
| youtube (medical-adjacent) | 15.91 % | 13.00 % | β2.91 | β18 % |
| gemini (medical test2) | 10.83 % | 8.97 % | β1.86 | β17 % |
| bigos (general Polish) | 20.62 % | 10.75 % | β9.87 | β48 % |
| Average (unweighted) | 21.16 % | 13.06 % | β8.10 | β38 % |
The bigos column is the catastrophic-forgetting indicator β it improves substantially, confirming the anti-forgetting recipe transfers from whisper-large-v3-turbo to whisper-medium.
Training recipe (best of ~20 experiments)
| Component | Value |
|---|---|
| Base model | openai/whisper-medium |
| Adapter | LoRA r=64, Ξ±=128, dropout=0.0 |
| LoRA targets | encoder + decoder attention + FFN projections |
| Learning rate | 2e-4 (linear, 10 % warmup) |
| Epochs | 5 (best @ epoch 4) |
| Batch size | 16 Γ 4 GPUs |
| Precision | fp16, gradient checkpointing (non-reentrant) |
| Anti-forgetting | KD Ξ±=0.3, T=2.0 from frozen base whisper-medium |
| Data mix | Medical Γ 2 oversampled + bigos 10k |
Training: ~3h47m on 4ΓA100 (SXM4-40GB).
Datasets
Fine-tuning used a Polish medical + general-domain mix:
| Dataset | Role | Samples (train) |
|---|---|---|
lion-ai/admed_voice (admed_anoni) |
Medical (synthetic) | 8,516 Γ 2 |
lion-ai/admed_voice (admed_human) |
Medical (human read) | 5,693 Γ 2 |
lion-ai/youtube_asr_30 |
Medical-adjacent YouTube | 3,712 Γ 2 |
lion-ai/pl_med_asr_test2 |
Medical (test2) | 1,301 Γ 2 |
lion-ai/bigos |
General Polish (replay) | 10,000 |
Evaluation uses held-out test splits from all five datasets (3,205 samples total).
Why anti-forgetting?
Naively fine-tuning Whisper on medical-only data dramatically improves medical WER but destroys performance on general Polish (e.g. 4.37 % medical / 19.46 % bigos β worse than base model's 15.72 %). This recipe combines three techniques:
- Data replay β mixing general-domain (bigos) samples in training
- Knowledge distillation β KL divergence loss to frozen base whisper-medium preserves its output distribution
- Medical oversampling β repeats the medical training data 2Γ to shift the balance without removing bigos
Result: strong medical WER AND no bigos forgetting.
Usage
from transformers import WhisperForConditionalGeneration, WhisperProcessor
import torch
model_id = "lion-ai/eskulap-asr-medium-beta"
model = WhisperForConditionalGeneration.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")
processor = WhisperProcessor.from_pretrained(model_id, language="Polish", task="transcribe")
# Inference
import librosa
audio, sr = librosa.load("sample.wav", sr=16000)
inputs = processor(audio, sampling_rate=16000, return_tensors="pt").to("cuda")
inputs["input_features"] = inputs["input_features"].half()
with torch.no_grad():
predicted_ids = model.generate(**inputs, language="pl", task="transcribe")
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcription)
Related work
This model is part of a broader research effort on fine-tuning Whisper for Polish medical ASR.
See also the larger variant (openai/whisper-large-v3-turbo base) trained with the same recipe.
License
Apache 2.0 (inherits from base model).
- Downloads last month
- 53
Model tree for lion-ai/eskulap-asr-medium-beta
Base model
openai/whisper-medium