Whisper-medium Polish Medical ASR (anti-forgetting)

A LoRA adapter for openai/whisper-medium fine-tuned on Polish medical speech using an anti-forgetting training recipe (knowledge distillation + medical data oversampling + general-domain replay) that specialises the model for medical Polish while preserving performance on general Polish speech.

This model reduces combined WER on held-out Polish medical test sets from 21.16 % β†’ 13.06 % (βˆ’38 % relative) while improving performance on general Polish speech (bigos: 20.62 % β†’ 10.75 %, βˆ’48 %). Naive medical fine-tuning typically destroys general-domain performance; this recipe avoids that trade-off.

Benchmark vs base whisper-medium (per dataset)

Held-out test sets (3,205 samples, fair-eval methodology β€” no train/test text overlap):

Test Set Base whisper-medium This model Ξ” (pp) Relative
admed_anoni (medical, synthetic) 32.48 % 19.13 % βˆ’13.35 βˆ’41 %
admed_human (medical, human read) 26.96 % 13.47 % βˆ’13.49 βˆ’50 %
youtube (medical-adjacent) 15.91 % 13.00 % βˆ’2.91 βˆ’18 %
gemini (medical test2) 10.83 % 8.97 % βˆ’1.86 βˆ’17 %
bigos (general Polish) 20.62 % 10.75 % βˆ’9.87 βˆ’48 %
Average (unweighted) 21.16 % 13.06 % βˆ’8.10 βˆ’38 %

The bigos column is the catastrophic-forgetting indicator β€” it improves substantially, confirming the anti-forgetting recipe transfers from whisper-large-v3-turbo to whisper-medium.

Training recipe (best of ~20 experiments)

Component Value
Base model openai/whisper-medium
Adapter LoRA r=64, Ξ±=128, dropout=0.0
LoRA targets encoder + decoder attention + FFN projections
Learning rate 2e-4 (linear, 10 % warmup)
Epochs 5 (best @ epoch 4)
Batch size 16 Γ— 4 GPUs
Precision fp16, gradient checkpointing (non-reentrant)
Anti-forgetting KD Ξ±=0.3, T=2.0 from frozen base whisper-medium
Data mix Medical Γ— 2 oversampled + bigos 10k

Training: ~3h47m on 4Γ—A100 (SXM4-40GB).

Datasets

Fine-tuning used a Polish medical + general-domain mix:

Dataset Role Samples (train)
lion-ai/admed_voice (admed_anoni) Medical (synthetic) 8,516 Γ— 2
lion-ai/admed_voice (admed_human) Medical (human read) 5,693 Γ— 2
lion-ai/youtube_asr_30 Medical-adjacent YouTube 3,712 Γ— 2
lion-ai/pl_med_asr_test2 Medical (test2) 1,301 Γ— 2
lion-ai/bigos General Polish (replay) 10,000

Evaluation uses held-out test splits from all five datasets (3,205 samples total).

Why anti-forgetting?

Naively fine-tuning Whisper on medical-only data dramatically improves medical WER but destroys performance on general Polish (e.g. 4.37 % medical / 19.46 % bigos β€” worse than base model's 15.72 %). This recipe combines three techniques:

  1. Data replay β€” mixing general-domain (bigos) samples in training
  2. Knowledge distillation β€” KL divergence loss to frozen base whisper-medium preserves its output distribution
  3. Medical oversampling β€” repeats the medical training data 2Γ— to shift the balance without removing bigos

Result: strong medical WER AND no bigos forgetting.

Usage

from transformers import WhisperForConditionalGeneration, WhisperProcessor
import torch

model_id = "lion-ai/eskulap-asr-medium-beta"
model = WhisperForConditionalGeneration.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")
processor = WhisperProcessor.from_pretrained(model_id, language="Polish", task="transcribe")

# Inference
import librosa
audio, sr = librosa.load("sample.wav", sr=16000)
inputs = processor(audio, sampling_rate=16000, return_tensors="pt").to("cuda")
inputs["input_features"] = inputs["input_features"].half()
with torch.no_grad():
    predicted_ids = model.generate(**inputs, language="pl", task="transcribe")
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcription)

Related work

This model is part of a broader research effort on fine-tuning Whisper for Polish medical ASR. See also the larger variant (openai/whisper-large-v3-turbo base) trained with the same recipe.

License

Apache 2.0 (inherits from base model).

Downloads last month
53
Safetensors
Model size
0.8B params
Tensor type
F16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for lion-ai/eskulap-asr-medium-beta

Adapter
(125)
this model

Datasets used to train lion-ai/eskulap-asr-medium-beta

Space using lion-ai/eskulap-asr-medium-beta 1