Whisper-medium Polish Medical ASR (anti-forgetting)

A LoRA adapter for openai/whisper-medium fine-tuned on Polish medical speech using an anti-forgetting training recipe (knowledge distillation + medical data oversampling + general-domain replay) that specialises the model for medical Polish while preserving performance on general Polish speech.

This model reduces combined WER on held-out Polish medical test sets from 21.16 % → 13.06 % (−38 % relative) while improving performance on general Polish speech (bigos: 20.62 % → 10.75 %, −48 %). Naive medical fine-tuning typically destroys general-domain performance; this recipe avoids that trade-off.

Benchmark vs base whisper-medium (per dataset)

Held-out test sets (3,205 samples, fair-eval methodology — no train/test text overlap):

Test Set	Base whisper-medium	This model	Δ (pp)	Relative
admed_anoni (medical, synthetic)	32.48 %	19.13 %	−13.35	−41 %
admed_human (medical, human read)	26.96 %	13.47 %	−13.49	−50 %
youtube (medical-adjacent)	15.91 %	13.00 %	−2.91	−18 %
gemini (medical test2)	10.83 %	8.97 %	−1.86	−17 %
bigos (general Polish)	20.62 %	10.75 %	−9.87	−48 %
Average (unweighted)	21.16 %	13.06 %	−8.10	−38 %

The bigos column is the catastrophic-forgetting indicator — it improves substantially, confirming the anti-forgetting recipe transfers from whisper-large-v3-turbo to whisper-medium.

Training recipe (best of ~20 experiments)

Component	Value
Base model	`openai/whisper-medium`
Adapter	LoRA r=64, α=128, dropout=0.0
LoRA targets	encoder + decoder attention + FFN projections
Learning rate	2e-4 (linear, 10 % warmup)
Epochs	5 (best @ epoch 4)
Batch size	16 × 4 GPUs
Precision	fp16, gradient checkpointing (non-reentrant)
Anti-forgetting	KD α=0.3, T=2.0 from frozen base whisper-medium
Data mix	Medical × 2 oversampled + bigos 10k

Training: ~3h47m on 4×A100 (SXM4-40GB).

Datasets

Fine-tuning used a Polish medical + general-domain mix:

Dataset	Role	Samples (train)
`lion-ai/admed_voice` (admed_anoni)	Medical (synthetic)	8,516 × 2
`lion-ai/admed_voice` (admed_human)	Medical (human read)	5,693 × 2
`lion-ai/youtube_asr_30`	Medical-adjacent YouTube	3,712 × 2
`lion-ai/pl_med_asr_test2`	Medical (test2)	1,301 × 2
`lion-ai/bigos`	General Polish (replay)	10,000

Evaluation uses held-out test splits from all five datasets (3,205 samples total).

Why anti-forgetting?

Naively fine-tuning Whisper on medical-only data dramatically improves medical WER but destroys performance on general Polish (e.g. 4.37 % medical / 19.46 % bigos — worse than base model's 15.72 %). This recipe combines three techniques:

Data replay — mixing general-domain (bigos) samples in training
Knowledge distillation — KL divergence loss to frozen base whisper-medium preserves its output distribution
Medical oversampling — repeats the medical training data 2× to shift the balance without removing bigos

Result: strong medical WER AND no bigos forgetting.

Usage

from transformers import WhisperForConditionalGeneration, WhisperProcessor
import torch

model_id = "lion-ai/eskulap-asr-medium-beta"
model = WhisperForConditionalGeneration.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")
processor = WhisperProcessor.from_pretrained(model_id, language="Polish", task="transcribe")

# Inference
import librosa
audio, sr = librosa.load("sample.wav", sr=16000)
inputs = processor(audio, sampling_rate=16000, return_tensors="pt").to("cuda")
inputs["input_features"] = inputs["input_features"].half()
with torch.no_grad():
    predicted_ids = model.generate(**inputs, language="pl", task="transcribe")
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcription)

Related work

This model is part of a broader research effort on fine-tuning Whisper for Polish medical ASR. See also the larger variant (openai/whisper-large-v3-turbo base) trained with the same recipe.

License

Apache 2.0 (inherits from base model).

Downloads last month: 53

Safetensors

Model size

0.8B params

Tensor type

F16

Model tree for lion-ai/eskulap-asr-medium-beta

Base model

openai/whisper-medium

Adapter

(125)

this model

lion-ai
/

eskulap-asr-medium-beta